You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Large Language Models (LLMs) are the technology behind the most widely discussed AI systems today. These models can generate human-like text, answer questions, write code, translate languages, and reason about complex problems.
The foundation of modern LLMs is the Transformer, introduced in the 2017 paper "Attention Is All You Need" by Vaswani et al. Transformers solve two key problems with RNNs: slow sequential training and lost long-range dependencies.
Self-attention allows every token to attend to every other token simultaneously:
Self-Attention Example:
Sentence: "The cat sat on the mat because it was tired"
When processing "it", attention scores might be:
The cat sat on the mat because it was tired
0.05 0.45 0.05 0.02 0.03 0.10 0.05 0.05 0.05 0.15
^^^^ ^^^^
"it" attends most strongly to "cat" and "tired"
Self-attention uses three projections: Query (what am I looking for?), Key (what do I contain?), Value (what information do I provide?). Multi-head attention runs several such computations in parallel.
Transformer Block:
+-----------------------------+
| Multi-Head Attention |
| + Residual |
| + Layer Norm |
+-----------------------------+
| Feed-Forward Network |
| + Residual |
| + Layer Norm |
+-----------------------------+
(repeated N times)
Trained on a massive corpus (trillions of tokens) to predict the next token. Extraordinarily expensive — frontier models can cost hundreds of millions in compute.
Further trained on curated datasets: Supervised Fine-Tuning (SFT) on high-quality examples, or domain-specific fine-tuning for particular fields.
RLHF Pipeline:
+----------+ +--------------+ +--------------+ +-----------+
| Prompt |--->| LLM generates|--->| Humans rank |--->| Train |
| | | multiple | | outputs | | reward |
| | | outputs | | | | model |
+----------+ +--------------+ +--------------+ +-----+-----+
|
+----------+ +--------------+ |
| Improved |<---| Fine-tune |<-----------------------------+
| LLM | | LLM with RL |
+----------+ +--------------+
Tip: Anthropic developed Constitutional AI, using AI-generated feedback to supplement human feedback for more scalable alignment.
LLMs generate text one token at a time via next-token prediction.
Temperature controls randomness (0 = deterministic, 0.7 = balanced, >1.0 = very random). Top-p (nucleus sampling) considers only the smallest set of tokens whose cumulative probability exceeds p.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.