You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Retrieval-Augmented Generation (RAG) is one of the most practical techniques for building AI applications that need access to specific knowledge. Instead of fine-tuning a model, you retrieve relevant documents and include them in the prompt.
RAG combines retrieval (finding relevant information) with generation (producing an answer using an LLM). It solves the key problem of LLMs: they only know what was in their training data.
| Approach | Cost | Freshness | Accuracy | Complexity |
|---|---|---|---|---|
| Prompt only | Very low | Static | Limited | Very low |
| Fine-tuning | High | Snapshot | Good | High |
| RAG | Medium | Real-time | High | Medium |
| RAG + Fine-tune | High | Real-time | Highest | High |
┌─────────────────────────────────────────────────┐
│ RAG Pipeline │
│ │
│ ┌──────────┐ ┌──────────┐ ┌───────────┐ │
│ │ Document │───▶│ Chunking │───▶│ Embedding │ │
│ │ Loading │ │ │ │ │ │
│ └──────────┘ └──────────┘ └─────┬─────┘ │
│ │ │
│ ▼ │
│ ┌───────────┐ │
│ │ Vector │ │
│ │ Database │ │
│ └─────┬─────┘ │
│ │ │
│ ┌──────────┐ ┌──────────┐ ┌─────┴─────┐ │
│ │ LLM │◀───│ Prompt │◀───│ Retrieval │ │
│ │ Response │ │ Assembly │ │ │ │
│ └──────────┘ └──────────┘ └───────────┘ │
└─────────────────────────────────────────────────┘
Load your source documents into a common format:
from pathlib import Path
def load_documents(directory: str) -> list[dict]:
"""Load text files from a directory."""
documents = []
for path in Path(directory).glob("**/*.txt"):
text = path.read_text(encoding="utf-8")
documents.append({
"content": text,
"metadata": {"source": str(path), "filename": path.name},
})
return documents
docs = load_documents("./knowledge_base")
print(f"Loaded {len(docs)} documents")
| Format | Library |
|---|---|
PyMuPDF, pdfplumber | |
| HTML | BeautifulSoup |
| DOCX | python-docx |
| CSV | pandas |
| JSON | Built-in json |
Documents are too long to embed as a single vector. Split them into smaller, meaningful chunks.
def chunk_by_tokens(text: str, chunk_size: int = 500,
overlap: int = 50) -> list[str]:
"""Split text into overlapping chunks by approximate token count."""
words = text.split()
chunks = []
start = 0
while start < len(words):
end = start + chunk_size
chunk = " ".join(words[start:end])
chunks.append(chunk)
start = end - overlap # overlap for context continuity
return chunks
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.