Kubernetes Architecture Deep Dive

Understanding Kubernetes architecture is foundational to running reliable production clusters. This lesson explores the control plane components, worker node internals, networking model, and cluster topologies that underpin every Kubernetes deployment.

High-Level Architecture

A Kubernetes cluster consists of two layers: the control plane and the worker nodes.

graph TD
  subgraph CP["Control Plane"]
    API["kube-apiserver"]
    ETCD["etcd (key-value store)"]
    SCHED["kube-scheduler"]
    CM["kube-controller-manager"]
    CCM["cloud-controller-manager"]
  end
  CP --> N1
  CP --> N2
  CP --> N3
  subgraph N1["Worker Node 1"]
    K1["kubelet"]
    P1["kube-proxy"]
    R1["Container Runtime"]
  end
  subgraph N2["Worker Node 2"]
    K2["kubelet"]
    P2["kube-proxy"]
    R2["Container Runtime"]
  end
  subgraph N3["Worker Node 3"]
    K3["kubelet"]
    P3["kube-proxy"]
    R3["Container Runtime"]
  end

Control Plane Components

kube-apiserver

The API server is the front door to the cluster. Every interaction — kubectl commands, controller actions, kubelet heartbeats — goes through the API server.

# All kubectl commands communicate with the API server
kubectl get pods
# GET https://<api-server>:6443/api/v1/namespaces/default/pods

kubectl apply -f deployment.yaml
# POST/PUT https://<api-server>:6443/apis/apps/v1/namespaces/default/deployments

Key characteristics:

RESTful API — resources are manipulated via standard HTTP verbs
Authentication and authorisation — supports certificates, tokens, OIDC
Admission control — mutating and validating webhooks intercept requests
Horizontally scalable — multiple instances behind a load balancer

etcd

etcd is a distributed key-value store that holds all cluster state — every resource, every config, every secret.

Property	Detail
Consensus	Raft protocol (requires quorum)
Recommended size	3 or 5 members (odd number for quorum)
Data stored	All Kubernetes objects (serialised as protobuf)
Backup frequency	Every 30 minutes minimum in production

# Backup etcd
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Verify the snapshot
ETCDCTL_API=3 etcdctl snapshot status /backup/etcd-snapshot.db --write-table

Production tip: Always run etcd on dedicated, SSD-backed nodes. etcd performance directly impacts cluster responsiveness.

kube-scheduler

The scheduler watches for unscheduled pods and assigns them to nodes based on:

Filtering — eliminate nodes that cannot run the pod (resource limits, taints, affinity)
Scoring — rank remaining nodes by preference (spread, resource balance)
Binding — assign the pod to the highest-scoring node

# Example: Influence scheduling with node affinity
apiVersion: v1
kind: Pod
metadata:
  name: gpu-workload
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: gpu-type
                operator: In
                values:
                  - nvidia-a100
  containers:
    - name: trainer
      image: ml-training:latest
      resources:
        limits:
          nvidia.com/gpu: 1

kube-controller-manager

A single binary that runs multiple controllers in loops:

Controller	Responsibility
Deployment controller	Manages ReplicaSets for Deployments
ReplicaSet controller	Ensures desired pod count matches actual
Node controller	Detects and responds to node failures
Job controller	Creates pods for Job completions
Endpoint controller	Populates Service endpoint objects
ServiceAccount controller	Creates default ServiceAccounts in new namespaces

Each controller follows the reconciliation loop pattern: observe the current state, compare to the desired state, and take action to converge.

cloud-controller-manager

Handles cloud-specific operations such as provisioning load balancers, managing node lifecycle, and configuring routes. It is specific to your cloud provider (AWS, GCP, Azure).

Worker Node Components

kubelet

The kubelet is the primary agent on each node. It:

Registers the node with the API server
Watches for pod assignments
Manages container lifecycle via the Container Runtime Interface (CRI)
Reports node and pod status back to the control plane
Runs liveness, readiness, and startup probes

# Check kubelet status on a node
systemctl status kubelet

# View kubelet logs
journalctl -u kubelet -f

kube-proxy

kube-proxy maintains network rules on each node, enabling Service abstraction:

Mode	How It Works	Performance
iptables	Creates iptables rules for each Service endpoint	Good (default)
IPVS	Uses Linux IPVS for load balancing	Better at scale
nftables	Uses nftables (newer kernels)	Modern

Container Runtime

Kubernetes uses the Container Runtime Interface (CRI) to support multiple runtimes:

containerd — the most common production runtime
CRI-O — lightweight, designed specifically for Kubernetes
gVisor / Kata Containers — for enhanced isolation

Kubernetes Networking Model

Kubernetes enforces a flat networking model with three fundamental rules:

Every pod gets its own IP address
Pods can communicate with any other pod without NAT
Agents on a node can communicate with all pods on that node

graph LR
  A["Pod A (10.244.1.5) — Node 1"] -->|"CNI Plugin (Overlay)"| B["Pod B (10.244.2.8) — Node 2"]

CNI Plugins

Plugin	Type	Features	Best For
Calico	L3	NetworkPolicy, BGP, eBPF	Production, security
Cilium	eBPF	NetworkPolicy, observability, encryption	Modern kernels
Flannel	Overlay	Simple VXLAN overlay	Simple clusters
Weave	Overlay	Encryption, multicast	Ease of use

Cluster Topologies

Single Control Plane (Development)

graph TD
  CP["Control Plane (single node)"]
  CP --> N1["Node"]
  CP --> N2["Node"]
  CP --> N3["Node"]

Highly Available Control Plane (Production)

graph TD
  LB["Load Balancer"]
  LB --> CP1["CP #1: API + etcd"]
  LB --> CP2["CP #2: API + etcd"]
  LB --> CP3["CP #3: API + etcd"]
  CP1 --> N1["Node"]
  CP1 --> N2["Node"]
  CP2 --> N3["Node"]
  CP3 --> N4["Node"]

Production recommendation: Use 3 control plane nodes with a load balancer. This tolerates the loss of 1 control plane node.

Inspecting the Cluster

# View cluster component status
kubectl get componentstatuses

# Inspect nodes
kubectl get nodes -o wide

# View all system pods
kubectl get pods -n kube-system

# Describe a node in detail
kubectl describe node <node-name>

# Check cluster info
kubectl cluster-info

Summary

The control plane consists of the API server, etcd, scheduler, and controller manager.
etcd stores all cluster state and should be backed up regularly.
The kubelet manages containers on each node via the CRI.
kube-proxy implements Service networking using iptables, IPVS, or nftables.
Kubernetes networking is flat — every pod gets a unique IP and can reach any other pod without NAT.
CNI plugins (Calico, Cilium, Flannel) implement the networking model.
Production clusters use highly available control planes with 3 or 5 nodes.

Kubernetes Architecture Deep Dive

Kubernetes Architecture Deep Dive

High-Level Architecture

Control Plane Components

kube-apiserver

etcd

kube-scheduler

kube-controller-manager

cloud-controller-manager

Worker Node Components

kubelet

kube-proxy

Container Runtime

Kubernetes Networking Model

CNI Plugins

Cluster Topologies

Single Control Plane (Development)

Highly Available Control Plane (Production)

Inspecting the Cluster

Summary

More in DevOps