Database Design & Scaling

Databases are the backbone of almost every system. Choosing the right database, designing an efficient schema, and scaling it as your system grows are critical skills. This lesson covers SQL vs NoSQL, normalisation, sharding, replication, and the CAP theorem.

SQL vs NoSQL

Relational Databases (SQL)

SQL databases store data in tables with rows and columns, linked by foreign keys. They guarantee ACID properties.

┌─────────────────────────────────────────────────┐
│                  ACID Properties                  │
├──────────────┬──────────────────────────────────┤
│ Atomicity    │ All-or-nothing transactions       │
│ Consistency  │ Data always valid per constraints │
│ Isolation    │ Concurrent txns don't interfere   │
│ Durability   │ Committed data survives crashes   │
└──────────────┴──────────────────────────────────┘

Examples: PostgreSQL, MySQL, SQL Server, Oracle

NoSQL Databases

NoSQL databases sacrifice some ACID guarantees for flexibility and horizontal scalability.

Type	Data Model	Examples	Best For
Document	JSON-like documents	MongoDB, CouchDB	Flexible schemas, nested data
Key-Value	Key → value pairs	Redis, DynamoDB	Caching, session storage
Wide-Column	Column families	Cassandra, HBase	Time-series, write-heavy
Graph	Nodes and edges	Neo4j, Amazon Neptune	Relationships, social graphs

When to Choose What

Criteria	SQL	NoSQL
Schema	Fixed, well-defined	Flexible, evolving
Relationships	Complex joins	Minimal joins
Transactions	Multi-table ACID	Usually single-document
Scale pattern	Vertical (read replicas)	Horizontal (sharding)
Query flexibility	Powerful (SQL)	Limited (varies by type)
Consistency	Strong by default	Eventually consistent

Normalisation vs Denormalisation

Normalisation

Normalisation eliminates data redundancy by splitting data into related tables.

Normalised (3NF):

┌──────────┐     ┌────────────┐     ┌──────────┐
│  Users   │     │   Orders   │     │ Products │
├──────────┤     ├────────────┤     ├──────────┤
│ id       │◀────│ user_id    │     │ id       │
│ name     │     │ product_id │────▶│ name     │
│ email    │     │ quantity   │     │ price    │
└──────────┘     │ created_at │     └──────────┘
                 └────────────┘

Pros: No duplicate data, easier updates, data integrity. Cons: Requires joins (slower reads), more complex queries.

Denormalisation

Denormalisation intentionally adds redundancy to avoid joins and speed up reads.

Denormalised:

┌─────────────────────────────────────────┐
│              OrdersFlat                  │
├─────────────────────────────────────────┤
│ order_id                                 │
│ user_name       (duplicated from Users)  │
│ user_email      (duplicated from Users)  │
│ product_name    (duplicated from Products)│
│ product_price   (duplicated from Products)│
│ quantity                                  │
│ created_at                               │
└─────────────────────────────────────────┘

Pros: Fast reads (no joins), simpler queries. Cons: Data duplication, harder updates (must update all copies), risk of inconsistency.

Approach	Reads	Writes	Storage	Consistency
Normalised	Slower	Faster	Less	Easier
Denormalised	Faster	Slower	More	Harder

Tip: Start normalised. Denormalise selectively when read performance becomes a bottleneck, and you have specific query patterns to optimise.

Database Sharding

Sharding splits a database into multiple smaller databases (shards), each holding a subset of the data.

Sharding Strategies

1. Range-Based Sharding

Shard 1: user_id   1 - 1,000,000
Shard 2: user_id   1,000,001 - 2,000,000
Shard 3: user_id   2,000,001 - 3,000,000

Pros: Simple to implement, range queries are efficient. Cons: Hotspots if data is unevenly distributed.

2. Hash-Based Sharding

shard_id = hash(user_id) % num_shards

Pros: Even distribution. Cons: Range queries require querying all shards; resharding is expensive.

3. Directory-Based Sharding

Database Design & Scaling

Database Design & Scaling

SQL vs NoSQL

Relational Databases (SQL)

NoSQL Databases

When to Choose What

Normalisation vs Denormalisation

Normalisation

Denormalisation

Database Sharding

Sharding Strategies

More in Programming