You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Understanding how to monitor your GCP resources and manage costs is just as important as knowing how to build with them. GCP provides comprehensive observability tools through Google Cloud Observability (formerly Stackdriver) and robust billing management features.
Cloud Monitoring collects metrics, creates dashboards, and triggers alerts across your GCP resources and applications.
Cloud Monitoring automatically collects metrics from GCP services:
| Service | Example Metrics |
|---|---|
| Compute Engine | CPU utilisation, disk I/O, network traffic |
| Cloud SQL | Database connections, query latency, storage usage |
| Cloud Run | Request count, request latency, instance count |
| Cloud Functions | Invocations, execution time, errors |
| Cloud Storage | Total bytes, object count, request count |
You can create your own metrics using the Cloud Monitoring API or OpenTelemetry:
from google.cloud import monitoring_v3
client = monitoring_v3.MetricServiceClient()
# Write custom metric data points
Cloud Monitoring provides:
Alerts notify you when metrics cross a threshold or when specific conditions are met.
| Component | Description |
|---|---|
| Condition | The metric threshold or absence condition to monitor |
| Duration | How long the condition must be met before firing |
| Notification channel | Where to send the alert (email, SMS, Slack, PagerDuty, webhook) |
| Documentation | Instructions for responders included in the alert |
"Alert me if Compute Engine CPU utilisation exceeds 80% for 5 minutes."
Cloud Logging is a centralised log management service for collecting, storing, searching, and analysing logs.
| Type | Description |
|---|---|
| Platform logs | Generated by GCP services (Cloud SQL, GKE, Cloud Run, etc.) |
| User logs | Sent from your applications via client libraries or agents |
| Audit logs | Record administrative actions (Admin Activity, Data Access, System Event) |
| Network logs | VPC Flow Logs, firewall rule logs |
The Log Explorer in the Console lets you:
resource.type="cloud_run_revision"
severity>=ERROR
textPayload:"connection refused"
| Log Type | Default Retention |
|---|---|
| Admin Activity | 400 days (no charge) |
| Data Access | 30 days |
| Platform / User logs | 30 days |
For longer retention, route logs to Cloud Storage or BigQuery.
Audit logs answer the questions: Who did what, where, and when?
| Audit Log Type | What It Records |
|---|---|
| Admin Activity | Configuration changes (create/delete/update resources) — always enabled |
| Data Access | Read operations on resource data (optional, can be verbose) |
| System Event | Google-initiated administrative actions (live migration, auto-scaling) |
| Policy Denied | Actions denied by IAM or organisation policies |
Error Reporting automatically groups and counts application errors:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.