Cloud Monitoring Overview

Google Cloud Monitoring is a fully managed service within Google Cloud's operations suite (formerly Stackdriver) that provides visibility into the performance, availability, and health of your cloud-powered applications and infrastructure. It collects metrics, events, and metadata from Google Cloud services, hosted uptime probes, and application instrumentation, giving you a unified view of your entire environment.

Why Monitoring Matters on GCP

Running workloads in the cloud introduces a level of dynamism that traditional on-premises monitoring tools were not designed for. Resources scale up and down, services communicate across regions, and deployments happen continuously. Without effective monitoring you are operating blind — unable to detect degradation, understand capacity trends, or respond to incidents before they affect users.

Key Benefits

Benefit	Description
Proactive detection	Identify issues before users report them through alerting and anomaly detection
Root cause analysis	Correlate metrics, logs, and traces to pinpoint the source of problems quickly
Capacity planning	Use historical trends to forecast resource needs and avoid over-provisioning
Cost visibility	Track resource utilisation to identify waste and right-size workloads
Compliance	Maintain audit trails and demonstrate adherence to operational standards

The Cost of Not Monitoring

Without proper monitoring:

Outages go undetected for hours, increasing mean time to recovery (MTTR)
Performance regressions slip into production unnoticed
Over-provisioned resources waste budget month after month
Security incidents are discovered after damage has occurred
Post-incident reviews lack the data needed to prevent recurrence

The Google Cloud Operations Suite

Google Cloud's operations suite provides an integrated set of monitoring, logging, and diagnostics tools:

Service	Purpose
Cloud Monitoring	Collects metrics, creates dashboards, and triggers alerts
Cloud Logging	Ingests, stores, and analyses log data at scale
Cloud Trace	Distributed tracing for latency analysis across services
Cloud Profiler	Continuous CPU and memory profiling of production applications
Error Reporting	Aggregates and tracks application errors with stack traces

How They Work Together

graph TD
  A["Google Cloud Operations Suite"] --> B["Metrics"]
  A --> C["Logs"]
  A --> D["Traces"]
  B --> E["Cloud Monitoring"]
  C --> F["Cloud Logging"]
  D --> G["Cloud Trace"]
  E --> H["Dashboards / Alerts"]
  F --> I["Log Explorer / Log Router"]
  G --> J["Trace Explorer / Latency Analysis"]

Cloud Monitoring sits at the centre of the observability stack. It ingests platform metrics automatically from every Google Cloud resource and supports custom metrics from your applications.

Key Concepts

Metrics

Metrics are numerical measurements collected at regular intervals. Google Cloud automatically collects platform metrics for every resource — such as CPU utilisation for Compute Engine instances, request count for Cloud Run services, and query latency for BigQuery jobs.

Metric Type	Description	Examples
Platform metrics	Automatically collected by Google Cloud services	`compute.googleapis.com/instance/cpu/utilization`
Custom metrics	Defined and sent by your application code	`custom.googleapis.com/orders/processed`
External metrics	Ingested from third-party systems via integrations	Datadog, Prometheus, or AWS CloudWatch metrics

Monitored Resources

Every metric is associated with a monitored resource — the entity that the metric describes. Examples include gce_instance, cloud_run_revision, gke_container, and cloudsql_database.

Workspaces (Scoping Projects)

Cloud Monitoring uses a scoping project (formerly called a workspace) to define which Google Cloud projects are monitored together. A single scoping project can monitor up to 375 projects, giving you a unified view across your entire organisation.

Metric Naming Convention

Google Cloud metrics follow a hierarchical naming convention:

<service>.googleapis.com/<resource>/<metric_name>

For example:

Metric	Description
`compute.googleapis.com/instance/cpu/utilization`	CPU utilisation of a Compute Engine VM
`run.googleapis.com/request_count`	Number of requests to a Cloud Run service
`cloudsql.googleapis.com/database/cpu/utilization`	CPU utilisation of a Cloud SQL instance
`loadbalancing.googleapis.com/https/request_count`	Request count for an HTTPS load balancer

Understanding this naming convention is essential for building queries, dashboards, and alert policies.

Getting Started

Enabling the API

Cloud Monitoring is enabled by default for all Google Cloud projects. To verify or enable it explicitly:

gcloud services enable monitoring.googleapis.com --project=my-project

Viewing Metrics in the Console

Navigate to Monitoring in the Google Cloud Console
Open Metrics Explorer
Select a resource type (e.g., VM Instance)
Choose a metric (e.g., CPU Utilisation)
Apply filters (e.g., zone, instance name)
Adjust the time range and aggregation

Using the gcloud CLI

# List available metric descriptors for Compute Engine
gcloud monitoring metrics-descriptors list \
  --filter='metric.type = starts_with("compute.googleapis.com/instance/cpu")'

# Read time-series data for a specific metric
gcloud monitoring time-series list \
  --filter='metric.type = "compute.googleapis.com/instance/cpu/utilization"' \
  --start-time="2024-01-01T00:00:00Z" \
  --end-time="2024-01-02T00:00:00Z"

Monitoring Strategy

A well-designed monitoring strategy on GCP covers multiple layers:

1. Infrastructure Monitoring

Monitor the health of your Google Cloud resources:

Compute Engine CPU, memory, disk, and network
GKE cluster node health, pod restarts, and container resource usage
Cloud SQL CPU, memory, connections, and replication lag
Cloud Storage request rates and error counts

2. Application Monitoring

Monitor application-level behaviour:

Request latency and error rates
Custom business metrics (orders processed, payments completed)
Dependency health (database connections, API call success rates)

3. Network Monitoring

Monitor connectivity and traffic patterns:

VPC flow logs for traffic analysis
Load balancer latency and error distribution
Cloud NAT connection counts and dropped packets

4. Security Monitoring

Monitor for threats and configuration drift:

IAM policy changes via audit logs
Firewall rule modifications
Anomalous API usage patterns

Summary

Google Cloud Monitoring is the foundation of observability on GCP. It automatically collects platform metrics from every resource, supports custom and external metrics, and provides a unified view across multiple projects through scoping projects. Combined with Cloud Logging, Cloud Trace, Cloud Profiler, and Error Reporting, it forms a comprehensive operations suite for maintaining the health, performance, and security of your cloud workloads. In the next lesson, we will dive deep into metrics and dashboards.

Cloud Monitoring Overview

Cloud Monitoring Overview

Why Monitoring Matters on GCP

Key Benefits

The Cost of Not Monitoring

The Google Cloud Operations Suite

How They Work Together

Key Concepts

Metrics

Monitored Resources

Workspaces (Scoping Projects)

Metric Naming Convention

Getting Started

Enabling the API

Viewing Metrics in the Console

Using the gcloud CLI

Monitoring Strategy

1. Infrastructure Monitoring

2. Application Monitoring

3. Network Monitoring

4. Security Monitoring

Summary

More in Cloud