You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Embeddings transform text into numerical vectors that capture semantic meaning. Combined with vector databases, they enable powerful features like semantic search, recommendation systems, and retrieval-augmented generation (RAG).
An embedding is a dense vector (list of floating-point numbers) that represents the meaning of a piece of text. Texts with similar meanings have vectors that are close together in the embedding space.
"The cat sat on the mat" → [0.023, -0.114, 0.891, ...] (1536 dimensions)
"A kitten was on the rug" → [0.025, -0.109, 0.887, ...] (very similar!)
"Stock prices rose today" → [-0.412, 0.067, 0.203, ...] (very different)
| Use Case | How Embeddings Help |
|---|---|
| Semantic search | Find results by meaning, not just keywords |
| Recommendations | Suggest similar items based on vector proximity |
| Clustering | Group similar documents automatically |
| RAG | Retrieve relevant context for LLM prompts |
| Anomaly detection | Identify outliers in vector space |
from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(
model="text-embedding-3-small",
input="The cat sat on the mat",
)
vector = response.data[0].embedding
print(f"Dimensions: {len(vector)}") # 1536
print(f"First 5 values: {vector[:5]}")
texts = [
"Introduction to machine learning",
"Deep learning fundamentals",
"How to bake a cake",
]
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts,
)
vectors = [item.embedding for item in response.data]
The most common similarity metric is cosine similarity, which measures the angle between two vectors:
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
similarity = cosine_similarity(vectors[0], vectors[1])
print(f"ML vs Deep Learning: {similarity:.4f}") # ~0.85 (high)
similarity = cosine_similarity(vectors[0], vectors[2])
print(f"ML vs Baking: {similarity:.4f}") # ~0.30 (low)
+-------+----------------------------+
| Score | Interpretation |
+-------+----------------------------+
| > 0.8 | Very similar / related |
| 0.5–0.8 | Somewhat related |
| < 0.5 | Unrelated / different |
+-------+----------------------------+
A vector database is optimised for storing, indexing, and querying high-dimensional vectors at scale.
| Database | Type | Best For |
|---|---|---|
| Pinecone | Managed cloud | Production, fully managed |
| Chroma | Embedded / local | Prototyping, local development |
| pgvector | PostgreSQL extension | Teams already using PostgreSQL |
| Weaviate | Self-hosted / cloud | Multimodal, GraphQL interface |
| Qdrant | Self-hosted / cloud | Performance, rich filtering |
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.