You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
The Performance Optimisation pillar of the GCP Architecture Framework focuses on ensuring your workloads meet performance requirements while using resources efficiently. It covers compute selection, scaling strategies, caching, database optimisation, and network performance. A well-optimised workload delivers fast response times, handles traffic spikes gracefully, and avoids wasting resources on over-provisioned infrastructure.
| Principle | Description |
|---|---|
| Measure first | Establish baselines and set performance targets before optimising |
| Choose the right compute | Match the compute service to your workload characteristics |
| Scale automatically | Use autoscaling to match capacity to demand |
| Cache aggressively | Reduce latency and backend load with strategic caching |
| Optimise data access | Choose the right database, design efficient schemas, and use connection pooling |
| Minimise distance | Place compute close to users and data close to compute |
The single most impactful performance decision is choosing the correct compute platform:
| Compute Service | Best For | Scaling Model |
|---|---|---|
| Cloud Run | Stateless HTTP services, APIs, event-driven processing | Scale to zero, per-request autoscaling |
| GKE | Complex microservices, stateful applications, GPU workloads | Horizontal pod autoscaling, node autoscaling |
| Cloud Functions | Event handlers, lightweight processing, glue logic | Per-invocation scaling, scale to zero |
| Compute Engine | Custom OS, legacy applications, HPC, specialised hardware | Managed instance groups with autoscaling |
| App Engine | Simple web applications, rapid prototyping | Automatic scaling based on traffic |
For Compute Engine and GKE, choosing the right machine type is critical:
| Series | Optimised For |
|---|---|
| E2 | General-purpose, cost-effective workloads |
| N2/N2D | Balanced performance for most workloads |
| C2/C2D | Compute-intensive workloads (high single-thread performance) |
| M2/M3 | Memory-intensive workloads (SAP HANA, in-memory databases) |
| A2/G2 | GPU workloads (ML training, rendering, HPC) |
| T2D/T2A | ARM-based, cost-efficient for cloud-native workloads |
# Use the recommender to identify over-provisioned VMs
gcloud recommender recommendations list \
--project=my-project \
--location=europe-west2-a \
--recommender=google.compute.instance.MachineTypeRecommender
GKE provides three levels of autoscaling:
| Level | Mechanism | What It Scales |
|---|---|---|
| Pod | Horizontal Pod Autoscaler (HPA) | Number of pod replicas based on CPU, memory, or custom metrics |
| Pod | Vertical Pod Autoscaler (VPA) | Pod resource requests and limits based on actual usage |
| Node | Cluster Autoscaler | Number of nodes to accommodate pending pods |
# Horizontal Pod Autoscaler configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: order-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-service
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# Configure Cloud Run autoscaling
gcloud run deploy my-service \
--min-instances=1 \
--max-instances=100 \
--concurrency=80 \
--cpu=2 \
--memory=1Gi
Caching reduces latency and backend load by serving frequently accessed data from faster storage:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.