Distributed Systems Fundamentals

A distributed system is a collection of independent computers that appears as a single system to users. Distributed systems introduce challenges around consistency, coordination, and fault tolerance that do not exist in single-machine systems. This lesson covers consensus, distributed locks, clock synchronisation, and more.

Why Distributed Systems Are Hard

In a single-machine system:              In a distributed system:

┌──────────────────────┐                  ┌──────┐     ┌──────┐
│  Everything is in    │                  │Node A│ ←?→ │Node B│
│  one place. One      │                  └──┬───┘     └──┬───┘
│  clock. One memory.  │                     │            │
│  One process.        │                     │  Network   │
│  Simple.             │                     │  (unreliable)
└──────────────────────┘                     │            │
                                          ┌──┴───┐     ┌──┴───┐
                                          │Node C│ ←?→ │Node D│
                                          └──────┘     └──────┘

- Messages can be lost, delayed, or reordered
- Nodes can crash and restart at any time
- Clocks are not synchronised
- There is no shared memory

The Eight Fallacies of Distributed Computing

Fallacy	Reality
The network is reliable	Packets get lost, connections drop
Latency is zero	Network calls take milliseconds to hundreds
Bandwidth is infinite	Limited and shared
The network is secure	Always assume untrusted
Topology doesn't change	Servers added/removed constantly
There is one administrator	Multiple teams, multiple data centres
Transport cost is zero	Serialisation, TLS, load balancers add cost
The network is homogeneous	Different hardware, software, protocols

Consensus Algorithms

When multiple nodes need to agree on a value (e.g. who is the leader, what is the latest committed transaction), they need a consensus algorithm.

Raft

Raft is designed to be understandable. It elects a leader, and the leader manages log replication.

Raft: Leader-Based Consensus

┌─────────────┐
│   Leader    │  (handles all client writes)
│   Node A    │
└──────┬──────┘
       │ AppendEntries
       │ (replicate log)
  ┌────┼────┐
  ▼    ▼    ▼
┌───┐┌───┐┌───┐
│ B ││ C ││ D │   (followers — replicate leader's log)
└───┘└───┘└───┘

Step 1: Client sends write to Leader
Step 2: Leader appends to its log
Step 3: Leader replicates to followers
Step 4: Once majority confirms → committed
Step 5: Leader responds to client

Raft Leader Election

1. Leader sends heartbeats every ~150ms
2. If a follower receives no heartbeat for a timeout period
   (randomised 150-300ms), it becomes a candidate
3. Candidate requests votes from other nodes
4. Node that receives majority of votes becomes leader
5. If split vote → increment term, new election

Paxos

Paxos is older and more general than Raft, but notoriously difficult to understand and implement. It uses proposers, acceptors, and learners.

Feature	Raft	Paxos
Understandability	High (designed for it)	Low (famously complex)
Leader	Single leader	Multi-proposer possible
Implementation	Simpler	Many variants, complex
Used by	etcd, Consul, CockroachDB	Google Chubby, Spanner

Distributed Locks

When multiple processes across different machines need exclusive access to a shared resource, they need a distributed lock.

Redlock Algorithm (Redis)

┌─────────────┐
│  Client     │
└──────┬──────┘
       │  1. Acquire lock on majority (3/5) of Redis nodes
       │
  ┌────┼────┬────┬────┐
  ▼    ▼    ▼    ▼    ▼
┌───┐┌───┐┌───┐┌───┐┌───┐
│R1 ││R2 ││R3 ││R4 ││R5 │
│ ✓ ││ ✓ ││ ✓ ││ ✗ ││ ✗ │
└───┘└───┘└───┘└───┘└───┘

Lock acquired (3/5 = majority)

Distributed Systems Fundamentals

Distributed Systems Fundamentals

Why Distributed Systems Are Hard

The Eight Fallacies of Distributed Computing

Consensus Algorithms

Raft

Raft Leader Election

Paxos

Distributed Locks

Redlock Algorithm (Redis)

Key Properties of Distributed Locks

More in Programming