You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Production Kubernetes clusters must handle variable load, survive infrastructure failures, and control costs. This lesson covers Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), Cluster Autoscaler, multi-AZ deployments, disaster recovery, and cost optimisation strategies.
| Type | What It Scales | Based On |
|---|---|---|
| HPA | Pod replicas | CPU, memory, custom metrics |
| VPA | Pod resource requests | Historical usage patterns |
| Cluster Autoscaler | Nodes | Pending pods with insufficient resources |
| KEDA | Pod replicas | Event-driven (queues, streams) |
┌──────────────────────────────────────────────────────────┐
│ Autoscaling Stack │
│ │
│ ┌───────────┐ ┌───────────┐ ┌──────────────────────┐ │
│ │ HPA │ │ VPA │ │ Cluster Autoscaler │ │
│ │ (pods) │ │ (resources)│ │ (nodes) │ │
│ └─────┬─────┘ └─────┬─────┘ └──────────┬───────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Kubernetes Cluster │ │
│ └──────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
HPA automatically adjusts the number of pod replicas based on observed metrics.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-api-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 25
periodSeconds: 120
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 1000
- type: External
external:
metric:
name: sqs_queue_length
selector:
matchLabels:
queue: processing
target:
type: AverageValue
averageValue: 100
# Monitor HPA status
kubectl get hpa -n production -w
# Describe for details
kubectl describe hpa web-api-hpa -n production
Tip: HPA requires metrics-server (for CPU/memory) or Prometheus Adapter (for custom metrics).
VPA recommends or automatically adjusts resource requests based on actual usage.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-api-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-api
updatePolicy:
updateMode: "Off" # "Off" = recommendation only, "Auto" = apply changes
resourcePolicy:
containerPolicies:
- containerName: web-api
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 4Gi
# View VPA recommendations
kubectl describe vpa web-api-vpa -n production
Warning: Do not use VPA with HPA on the same CPU/memory metrics. They will conflict. Use VPA for right-sizing resources and HPA for replica scaling.
The Cluster Autoscaler adds or removes nodes based on pod scheduling needs.
When nodes are underutilised (< 50% for 10 minutes by default), the autoscaler removes them.
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
template:
spec:
containers:
- name: cluster-autoscaler
image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.28.0
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
- --balance-similar-node-groups
- --scale-down-delay-after-add=10m
- --scale-down-unneeded-time=10m
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.