You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Cloud Monitoring Overview
Cloud Monitoring Overview
Google Cloud Monitoring is a fully managed service within Google Cloud's operations suite (formerly Stackdriver) that provides visibility into the performance, availability, and health of your cloud-powered applications and infrastructure. It collects metrics, events, and metadata from Google Cloud services, hosted uptime probes, and application instrumentation, giving you a unified view of your entire environment.
Why Monitoring Matters on GCP
Running workloads in the cloud introduces a level of dynamism that traditional on-premises monitoring tools were not designed for. Resources scale up and down, services communicate across regions, and deployments happen continuously. Without effective monitoring you are operating blind — unable to detect degradation, understand capacity trends, or respond to incidents before they affect users.
Key Benefits
| Benefit | Description |
|---|---|
| Proactive detection | Identify issues before users report them through alerting and anomaly detection |
| Root cause analysis | Correlate metrics, logs, and traces to pinpoint the source of problems quickly |
| Capacity planning | Use historical trends to forecast resource needs and avoid over-provisioning |
| Cost visibility | Track resource utilisation to identify waste and right-size workloads |
| Compliance | Maintain audit trails and demonstrate adherence to operational standards |
The Cost of Not Monitoring
Without proper monitoring:
- Outages go undetected for hours, increasing mean time to recovery (MTTR)
- Performance regressions slip into production unnoticed
- Over-provisioned resources waste budget month after month
- Security incidents are discovered after damage has occurred
- Post-incident reviews lack the data needed to prevent recurrence
The Google Cloud Operations Suite
Google Cloud's operations suite provides an integrated set of monitoring, logging, and diagnostics tools:
| Service | Purpose |
|---|---|
| Cloud Monitoring | Collects metrics, creates dashboards, and triggers alerts |
| Cloud Logging | Ingests, stores, and analyses log data at scale |
| Cloud Trace | Distributed tracing for latency analysis across services |
| Cloud Profiler | Continuous CPU and memory profiling of production applications |
| Error Reporting | Aggregates and tracks application errors with stack traces |
How They Work Together
Google Cloud Operations Suite
/ | \
Metrics Logs Traces
| | |
Cloud Monitoring Cloud Logging Cloud Trace
| | |
Dashboards Log Explorer Trace Explorer
Alerts Log Router Latency Analysis
Cloud Monitoring sits at the centre of the observability stack. It ingests platform metrics automatically from every Google Cloud resource and supports custom metrics from your applications.
Key Concepts
Metrics
Metrics are numerical measurements collected at regular intervals. Google Cloud automatically collects platform metrics for every resource — such as CPU utilisation for Compute Engine instances, request count for Cloud Run services, and query latency for BigQuery jobs.
| Metric Type | Description | Examples |
|---|---|---|
| Platform metrics | Automatically collected by Google Cloud services | compute.googleapis.com/instance/cpu/utilization |
| Custom metrics | Defined and sent by your application code | custom.googleapis.com/orders/processed |
| External metrics | Ingested from third-party systems via integrations | Datadog, Prometheus, or AWS CloudWatch metrics |
Monitored Resources
Every metric is associated with a monitored resource — the entity that the metric describes. Examples include gce_instance, cloud_run_revision, gke_container, and cloudsql_database.
Workspaces (Scoping Projects)
Cloud Monitoring uses a scoping project (formerly called a workspace) to define which Google Cloud projects are monitored together. A single scoping project can monitor up to 375 projects, giving you a unified view across your entire organisation.
Metric Naming Convention
Google Cloud metrics follow a hierarchical naming convention:
<service>.googleapis.com/<resource>/<metric_name>
For example:
| Metric | Description |
|---|---|
compute.googleapis.com/instance/cpu/utilization |
CPU utilisation of a Compute Engine VM |
run.googleapis.com/request_count |
Number of requests to a Cloud Run service |
cloudsql.googleapis.com/database/cpu/utilization |
CPU utilisation of a Cloud SQL instance |
loadbalancing.googleapis.com/https/request_count |
Request count for an HTTPS load balancer |
Understanding this naming convention is essential for building queries, dashboards, and alert policies.
Getting Started
Enabling the API
Cloud Monitoring is enabled by default for all Google Cloud projects. To verify or enable it explicitly:
gcloud services enable monitoring.googleapis.com --project=my-project
Viewing Metrics in the Console
- Navigate to Monitoring in the Google Cloud Console
- Open Metrics Explorer
- Select a resource type (e.g., VM Instance)
- Choose a metric (e.g., CPU Utilisation)
- Apply filters (e.g., zone, instance name)
- Adjust the time range and aggregation
Using the gcloud CLI
# List available metric descriptors for Compute Engine
gcloud monitoring metrics-descriptors list \
--filter='metric.type = starts_with("compute.googleapis.com/instance/cpu")'
# Read time-series data for a specific metric
gcloud monitoring time-series list \
--filter='metric.type = "compute.googleapis.com/instance/cpu/utilization"' \
--start-time="2024-01-01T00:00:00Z" \
--end-time="2024-01-02T00:00:00Z"
Monitoring Strategy
A well-designed monitoring strategy on GCP covers multiple layers:
1. Infrastructure Monitoring
Monitor the health of your Google Cloud resources:
- Compute Engine CPU, memory, disk, and network
- GKE cluster node health, pod restarts, and container resource usage
- Cloud SQL CPU, memory, connections, and replication lag
- Cloud Storage request rates and error counts
2. Application Monitoring
Monitor application-level behaviour:
- Request latency and error rates
- Custom business metrics (orders processed, payments completed)
- Dependency health (database connections, API call success rates)
3. Network Monitoring
Monitor connectivity and traffic patterns:
- VPC flow logs for traffic analysis
- Load balancer latency and error distribution
- Cloud NAT connection counts and dropped packets
4. Security Monitoring
Monitor for threats and configuration drift:
- IAM policy changes via audit logs
- Firewall rule modifications
- Anomalous API usage patterns
Summary
Google Cloud Monitoring is the foundation of observability on GCP. It automatically collects platform metrics from every resource, supports custom and external metrics, and provides a unified view across multiple projects through scoping projects. Combined with Cloud Logging, Cloud Trace, Cloud Profiler, and Error Reporting, it forms a comprehensive operations suite for maintaining the health, performance, and security of your cloud workloads. In the next lesson, we will dive deep into metrics and dashboards.