You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
One of Kubernetes's most powerful capabilities is the ability to scale applications up or down — manually or automatically — to match demand. Kubernetes provides three complementary autoscaling mechanisms.
The simplest way to change the number of pod replicas is the kubectl scale command or editing the replicas field in a Deployment manifest.
# Scale a deployment to 5 replicas
kubectl scale deployment web-app --replicas=5
# Verify the new replica count
kubectl get deployment web-app
Manual scaling is straightforward but requires human intervention. For dynamic workloads, autoscaling is preferable.
The HorizontalPodAutoscaler (HPA) automatically adjusts the number of pod replicas based on observed metrics — typically CPU utilisation, but also memory or custom metrics via the Kubernetes Metrics API.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
The HPA controller checks metrics every 15 seconds (by default) and scales the Deployment to keep average CPU utilisation at 70%. It respects the minReplicas and maxReplicas bounds.
# Create an HPA imperatively
kubectl autoscale deployment web-app --cpu-percent=70 --min=2 --max=20
# Check HPA status
kubectl get hpa
# Describe an HPA to see current and desired replicas
kubectl describe hpa web-app-hpa
For the HPA to work, pods must define resources.requests.cpu and the metrics-server add-on must be installed in the cluster.
The VerticalPodAutoscaler (VPA) adjusts the CPU and memory requests and limits of individual containers based on historical usage. Instead of adding more pods, it makes each pod more appropriately sized. VPA requires an additional controller and is not installed by default.
The Cluster Autoscaler adds or removes nodes from the cluster based on pending pods (pods that cannot be scheduled because no node has sufficient capacity) and underutilised nodes. It integrates with cloud provider APIs to provision or terminate node instances. All major managed Kubernetes services support it.
# Watch pods being created during an HPA scale-out event
kubectl get pods -w
# Simulate CPU load to trigger HPA (in a test pod)
kubectl run load-test --image=busybox --restart=Never -- sh -c "while true; do :; done"
A mature production setup typically combines all three:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.