You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Databases are the backbone of almost every system. Choosing the right database, designing an efficient schema, and scaling it as your system grows are critical skills. This lesson covers SQL vs NoSQL, normalisation, sharding, replication, and the CAP theorem.
SQL databases store data in tables with rows and columns, linked by foreign keys. They guarantee ACID properties.
┌─────────────────────────────────────────────────┐
│ ACID Properties │
├──────────────┬──────────────────────────────────┤
│ Atomicity │ All-or-nothing transactions │
│ Consistency │ Data always valid per constraints │
│ Isolation │ Concurrent txns don't interfere │
│ Durability │ Committed data survives crashes │
└──────────────┴──────────────────────────────────┘
Examples: PostgreSQL, MySQL, SQL Server, Oracle
NoSQL databases sacrifice some ACID guarantees for flexibility and horizontal scalability.
| Type | Data Model | Examples | Best For |
|---|---|---|---|
| Document | JSON-like documents | MongoDB, CouchDB | Flexible schemas, nested data |
| Key-Value | Key → value pairs | Redis, DynamoDB | Caching, session storage |
| Wide-Column | Column families | Cassandra, HBase | Time-series, write-heavy |
| Graph | Nodes and edges | Neo4j, Amazon Neptune | Relationships, social graphs |
| Criteria | SQL | NoSQL |
|---|---|---|
| Schema | Fixed, well-defined | Flexible, evolving |
| Relationships | Complex joins | Minimal joins |
| Transactions | Multi-table ACID | Usually single-document |
| Scale pattern | Vertical (read replicas) | Horizontal (sharding) |
| Query flexibility | Powerful (SQL) | Limited (varies by type) |
| Consistency | Strong by default | Eventually consistent |
Normalisation eliminates data redundancy by splitting data into related tables.
Normalised (3NF):
┌──────────┐ ┌────────────┐ ┌──────────┐
│ Users │ │ Orders │ │ Products │
├──────────┤ ├────────────┤ ├──────────┤
│ id │◀────│ user_id │ │ id │
│ name │ │ product_id │────▶│ name │
│ email │ │ quantity │ │ price │
└──────────┘ │ created_at │ └──────────┘
└────────────┘
Pros: No duplicate data, easier updates, data integrity. Cons: Requires joins (slower reads), more complex queries.
Denormalisation intentionally adds redundancy to avoid joins and speed up reads.
Denormalised:
┌─────────────────────────────────────────┐
│ OrdersFlat │
├─────────────────────────────────────────┤
│ order_id │
│ user_name (duplicated from Users) │
│ user_email (duplicated from Users) │
│ product_name (duplicated from Products)│
│ product_price (duplicated from Products)│
│ quantity │
│ created_at │
└─────────────────────────────────────────┘
Pros: Fast reads (no joins), simpler queries. Cons: Data duplication, harder updates (must update all copies), risk of inconsistency.
| Approach | Reads | Writes | Storage | Consistency |
|---|---|---|---|---|
| Normalised | Slower | Faster | Less | Easier |
| Denormalised | Faster | Slower | More | Harder |
Tip: Start normalised. Denormalise selectively when read performance becomes a bottleneck, and you have specific query patterns to optimise.
Sharding splits a database into multiple smaller databases (shards), each holding a subset of the data.
1. Range-Based Sharding
Shard 1: user_id 1 - 1,000,000
Shard 2: user_id 1,000,001 - 2,000,000
Shard 3: user_id 2,000,001 - 3,000,000
Pros: Simple to implement, range queries are efficient. Cons: Hotspots if data is unevenly distributed.
2. Hash-Based Sharding
shard_id = hash(user_id) % num_shards
Pros: Even distribution. Cons: Range queries require querying all shards; resharding is expensive.
3. Directory-Based Sharding
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.