You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Large Language Models (LLMs) are the technology behind the most widely discussed AI systems today. These models can generate human-like text, answer questions, write code, translate languages, and reason about complex problems.
The foundation of modern LLMs is the Transformer, introduced in the 2017 paper "Attention Is All You Need" by Vaswani et al. Transformers solve two key problems with RNNs: slow sequential training and lost long-range dependencies.
Self-attention allows every token to attend to every other token simultaneously:
Self-Attention Example:
Sentence: "The cat sat on the mat because it was tired"
When processing "it", attention scores might be:
The cat sat on the mat because it was tired
0.05 0.45 0.05 0.02 0.03 0.10 0.05 0.05 0.05 0.15
^^^^ ^^^^
"it" attends most strongly to "cat" and "tired"
Self-attention uses three projections: Query (what am I looking for?), Key (what do I contain?), Value (what information do I provide?). Multi-head attention runs several such computations in parallel.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.