Scaling and Autoscaling

One of Kubernetes's most powerful capabilities is the ability to scale applications up or down — manually or automatically — to match demand. Kubernetes provides three complementary autoscaling mechanisms.

Manual Scaling

The simplest way to change the number of pod replicas is the kubectl scale command or editing the replicas field in a Deployment manifest.

# Scale a deployment to 5 replicas
kubectl scale deployment web-app --replicas=5

# Verify the new replica count
kubectl get deployment web-app

Manual scaling is straightforward but requires human intervention. For dynamic workloads, autoscaling is preferable.

Horizontal Pod Autoscaler (HPA)

The HorizontalPodAutoscaler (HPA) automatically adjusts the number of pod replicas based on observed metrics — typically CPU utilisation, but also memory or custom metrics via the Kubernetes Metrics API.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

The HPA controller checks metrics every 15 seconds (by default) and scales the Deployment to keep average CPU utilisation at 70%. It respects the minReplicas and maxReplicas bounds.

# Create an HPA imperatively
kubectl autoscale deployment web-app --cpu-percent=70 --min=2 --max=20

# Check HPA status
kubectl get hpa

# Describe an HPA to see current and desired replicas
kubectl describe hpa web-app-hpa

For the HPA to work, pods must define resources.requests.cpu and the metrics-server add-on must be installed in the cluster.

Vertical Pod Autoscaler (VPA)

The VerticalPodAutoscaler (VPA) adjusts the CPU and memory requests and limits of individual containers based on historical usage. Instead of adding more pods, it makes each pod more appropriately sized. VPA requires an additional controller and is not installed by default.

Cluster Autoscaler

The Cluster Autoscaler adds or removes nodes from the cluster based on pending pods (pods that cannot be scheduled because no node has sufficient capacity) and underutilised nodes. It integrates with cloud provider APIs to provision or terminate node instances. All major managed Kubernetes services support it.

Scaling Workflow in Practice

# Watch pods being created during an HPA scale-out event
kubectl get pods -w

# Simulate CPU load to trigger HPA (in a test pod)
kubectl run load-test --image=busybox --restart=Never -- sh -c "while true; do :; done"

Combining Scaling Mechanisms

A mature production setup typically combines all three:

Scaling and Autoscaling

Scaling and Autoscaling

Manual Scaling

Horizontal Pod Autoscaler (HPA)

Vertical Pod Autoscaler (VPA)

Cluster Autoscaler

Scaling Workflow in Practice

Combining Scaling Mechanisms

More in DevOps