Scaling & High Availability

Production Kubernetes clusters must handle variable load, survive infrastructure failures, and control costs. This lesson covers Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), Cluster Autoscaler, multi-AZ deployments, disaster recovery, and cost optimisation strategies.

Autoscaling Overview

Type	What It Scales	Based On
HPA	Pod replicas	CPU, memory, custom metrics
VPA	Pod resource requests	Historical usage patterns
Cluster Autoscaler	Nodes	Pending pods with insufficient resources
KEDA	Pod replicas	Event-driven (queues, streams)

graph TD
  subgraph STACK["Autoscaling Stack"]
    HPA["HPA (pods)"]
    VPA["VPA (resources)"]
    CA["Cluster Autoscaler (nodes)"]
  end
  HPA --> CLUSTER["Kubernetes Cluster"]
  VPA --> CLUSTER
  CA --> CLUSTER

Horizontal Pod Autoscaler (HPA)

HPA automatically adjusts the number of pod replicas based on observed metrics.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 25
          periodSeconds: 120

HPA with Custom Metrics

metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: 1000
  - type: External
    external:
      metric:
        name: sqs_queue_length
        selector:
          matchLabels:
            queue: processing
      target:
        type: AverageValue
        averageValue: 100

# Monitor HPA status
kubectl get hpa -n production -w

# Describe for details
kubectl describe hpa web-api-hpa -n production

Tip: HPA requires metrics-server (for CPU/memory) or Prometheus Adapter (for custom metrics).

Vertical Pod Autoscaler (VPA)

VPA recommends or automatically adjusts resource requests based on actual usage.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  updatePolicy:
    updateMode: "Off"         # "Off" = recommendation only, "Auto" = apply changes
  resourcePolicy:
    containerPolicies:
      - containerName: web-api
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 2
          memory: 4Gi

# View VPA recommendations
kubectl describe vpa web-api-vpa -n production

Warning: Do not use VPA with HPA on the same CPU/memory metrics. They will conflict. Use VPA for right-sizing resources and HPA for replica scaling.

Cluster Autoscaler

The Cluster Autoscaler adds or removes nodes based on pod scheduling needs.

How It Works

A pod cannot be scheduled due to insufficient resources
Cluster Autoscaler detects the pending pod
It adds a new node from the configured node group
The scheduler places the pod on the new node

When nodes are underutilised (< 50% for 10 minutes by default), the autoscaler removes them.

AWS EKS Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        - name: cluster-autoscaler
          image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.28.0
          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=aws
            - --skip-nodes-with-local-storage=false
            - --expander=least-waste
            - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
            - --balance-similar-node-groups
            - --scale-down-delay-after-add=10m
            - --scale-down-unneeded-time=10m

Expander Strategies

Strategy	How It Chooses
random	Pick a random node group
most-pods	Choose the group that can schedule the most pods
least-waste	Choose the group with the least wasted resources
priority	Use a priority-based ConfigMap

Multi-AZ Deployments

Distribute workloads across availability zones for resilience.

Scaling & High Availability

Scaling & High Availability

Autoscaling Overview

Horizontal Pod Autoscaler (HPA)

HPA with Custom Metrics

Vertical Pod Autoscaler (VPA)

Cluster Autoscaler

How It Works

AWS EKS Configuration

Expander Strategies

Multi-AZ Deployments

More in DevOps