You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
The Operational Excellence pillar of the GCP Architecture Framework focuses on your ability to run, monitor, and continuously improve your cloud workloads. It covers the processes, culture, and tooling that enable teams to operate services reliably and efficiently. A workload with strong operational excellence is observable, automated, and continuously improving.
| Principle | Description |
|---|---|
| Automate everything | Manual processes are error-prone and do not scale — automate deployments, testing, and remediation |
| Monitor and observe | You cannot improve what you cannot measure — instrument every layer of your stack |
| Learn from failure | Use incidents as opportunities to improve systems and processes through blameless post-mortems |
| Manage change safely | Deploy frequently in small batches with automated rollback capabilities |
| Codify operations | Treat operational runbooks, configurations, and policies as code |
All infrastructure on GCP should be defined and managed as code. This ensures consistency, repeatability, and auditability.
# Define a GKE cluster with Terraform
resource "google_container_cluster" "primary" {
name = "production-cluster"
location = "europe-west2"
remove_default_node_pool = true
initial_node_count = 1
workload_identity_config {
workload_pool = "my-project.svc.id.goog"
}
}
resource "google_container_node_pool" "primary_nodes" {
name = "primary-node-pool"
location = "europe-west2"
cluster = google_container_cluster.primary.name
node_count = 3
node_config {
machine_type = "e2-standard-4"
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform",
]
}
}
| Practice | Description |
|---|---|
| Version control | Store all IaC in Git with branch protection and code review |
| Remote state | Use GCS backend for Terraform state with state locking |
| Modular design | Create reusable modules for common patterns (VPC, GKE, Cloud SQL) |
| Plan before apply | Always run terraform plan and review changes before applying |
| Drift detection | Regularly compare actual state against desired state |
| Secret management | Never store secrets in IaC files — use Secret Manager references |
Continuous integration and continuous delivery automate the build, test, and deployment process:
Cloud Build is GCP's native CI/CD service:
# cloudbuild.yaml
steps:
# Run tests
- name: 'node:18'
entrypoint: 'npm'
args: ['test']
# Build container image
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'europe-west2-docker.pkg.dev/my-project/my-repo/my-app:latest', '.']
# Push to Artifact Registry
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'europe-west2-docker.pkg.dev/my-project/my-repo/my-app:latest']
# Deploy to Cloud Run
- name: 'gcr.io/cloud-builders/gcloud'
args: ['run', 'deploy', 'my-app',
'--image', 'europe-west2-docker.pkg.dev/my-project/my-repo/my-app:latest',
'--region', 'europe-west2',
'--platform', 'managed']
| Strategy | Description | GCP Implementation |
|---|---|---|
| Rolling update | Replace instances gradually | GKE rolling deployments, MIG rolling updates |
| Blue-green | Run two environments, switch traffic | Cloud Run traffic splitting, GKE with multiple deployments |
| Canary | Route a small percentage of traffic to the new version | Cloud Run traffic splitting, Istio on GKE |
| Feature flags | Enable features for specific users without deploying | Firebase Remote Config, custom feature flag service |
Operational Excellence requires comprehensive observability:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.