Scalability & Load Balancing

Scalability is the ability of a system to handle increasing load by adding resources. Load balancing is the technique of distributing incoming requests across multiple servers. Together, they form the foundation of any system that needs to grow beyond a single machine.

Vertical vs Horizontal Scaling

There are two fundamental approaches to scaling:

Vertical Scaling                    Horizontal Scaling
(Scale Up)                          (Scale Out)

┌───────────────┐                   ┌──────┐ ┌──────┐ ┌──────┐
│               │                   │Server│ │Server│ │Server│
│  BIG SERVER   │                   │  1   │ │  2   │ │  3   │
│  (more CPU,   │                   └──────┘ └──────┘ └──────┘
│   more RAM,   │                        ▲       ▲       ▲
│   more disk)  │                        └───────┼───────┘
│               │                                │
└───────────────┘                        ┌──────────────┐
                                         │Load Balancer │
                                         └──────────────┘

Comparison

Aspect	Vertical Scaling	Horizontal Scaling
Approach	Bigger machine	More machines
Complexity	Low	Higher (distributed state)
Cost curve	Exponential	Linear
Downtime risk	Single point of failure	Redundancy built in
Upper limit	Hardware limits	Practically unlimited
State management	Simple (single node)	Requires external state

Tip: Most production systems use horizontal scaling as their primary strategy, with vertical scaling for components that are difficult to distribute (e.g. certain databases).

Load Balancers

A load balancer sits between clients and servers, distributing requests to prevent any single server from becoming overwhelmed.

Layer 4 vs Layer 7 Load Balancing

┌──────────────────────────────────────────────────────────┐
│                   OSI Model (Simplified)                   │
├──────────┬───────────────────────────────────────────────┤
│ Layer 7  │ Application (HTTP, HTTPS, WebSocket)          │
│ Layer 4  │ Transport (TCP, UDP)                          │
│ Layer 3  │ Network (IP)                                  │
└──────────┴───────────────────────────────────────────────┘

Feature	L4 Load Balancer	L7 Load Balancer
Operates at	TCP/UDP level	HTTP/HTTPS level
Inspects	IP, port, TCP headers	URL, headers, cookies, body
Speed	Very fast	Slightly slower
Content routing	No	Yes (route by URL, header)
SSL termination	Pass-through or terminate	Typically terminates
Use case	Simple distribution	Smart routing, A/B testing
Examples	AWS NLB, HAProxy (TCP)	AWS ALB, NGINX, Envoy

Load Balancing Algorithms

Round Robin

Requests are distributed sequentially to each server in turn.

Request 1  ──▶  Server A
Request 2  ──▶  Server B
Request 3  ──▶  Server C
Request 4  ──▶  Server A   (cycles back)
Request 5  ──▶  Server B

Pros: Simple, even distribution when servers are identical. Cons: Ignores server load and capacity differences.

Weighted Round Robin

Servers with more capacity receive more requests.

Weights: A=3, B=2, C=1
Request 1  ──▶  Server A
Request 2  ──▶  Server A
Request 3  ──▶  Server A
Request 4  ──▶  Server B
Request 5  ──▶  Server B
Request 6  ──▶  Server C

Least Connections

Requests go to the server with the fewest active connections.

Server A: 12 connections  ┐
Server B:  5 connections  │──▶  Request goes to Server C
Server C:  3 connections  ┘     (fewest connections)

Best for: Requests with varying processing times (long-lived connections).

Scalability & Load Balancing

Scalability & Load Balancing

Vertical vs Horizontal Scaling

Comparison

Load Balancers

Layer 4 vs Layer 7 Load Balancing

Load Balancing Algorithms

Round Robin

Weighted Round Robin

Least Connections

IP Hash

More in Programming