Skip to content

You are viewing a free preview of this lesson.

Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.

Cloud Monitoring Overview

Cloud Monitoring Overview

Google Cloud Monitoring is a fully managed service within Google Cloud's operations suite (formerly Stackdriver) that provides visibility into the performance, availability, and health of your cloud-powered applications and infrastructure. It collects metrics, events, and metadata from Google Cloud services, hosted uptime probes, and application instrumentation, giving you a unified view of your entire environment.


Why Monitoring Matters on GCP

Running workloads in the cloud introduces a level of dynamism that traditional on-premises monitoring tools were not designed for. Resources scale up and down, services communicate across regions, and deployments happen continuously. Without effective monitoring you are operating blind — unable to detect degradation, understand capacity trends, or respond to incidents before they affect users.

Key Benefits

Benefit Description
Proactive detection Identify issues before users report them through alerting and anomaly detection
Root cause analysis Correlate metrics, logs, and traces to pinpoint the source of problems quickly
Capacity planning Use historical trends to forecast resource needs and avoid over-provisioning
Cost visibility Track resource utilisation to identify waste and right-size workloads
Compliance Maintain audit trails and demonstrate adherence to operational standards

The Cost of Not Monitoring

Without proper monitoring:

  • Outages go undetected for hours, increasing mean time to recovery (MTTR)
  • Performance regressions slip into production unnoticed
  • Over-provisioned resources waste budget month after month
  • Security incidents are discovered after damage has occurred
  • Post-incident reviews lack the data needed to prevent recurrence

The Google Cloud Operations Suite

Google Cloud's operations suite provides an integrated set of monitoring, logging, and diagnostics tools:

Service Purpose
Cloud Monitoring Collects metrics, creates dashboards, and triggers alerts
Cloud Logging Ingests, stores, and analyses log data at scale
Cloud Trace Distributed tracing for latency analysis across services
Cloud Profiler Continuous CPU and memory profiling of production applications
Error Reporting Aggregates and tracks application errors with stack traces

How They Work Together

                  Google Cloud Operations Suite
                  /           |            \
            Metrics         Logs         Traces
              |              |              |
      Cloud Monitoring  Cloud Logging  Cloud Trace
              |              |              |
          Dashboards     Log Explorer    Trace Explorer
          Alerts         Log Router      Latency Analysis

Cloud Monitoring sits at the centre of the observability stack. It ingests platform metrics automatically from every Google Cloud resource and supports custom metrics from your applications.


Key Concepts

Metrics

Metrics are numerical measurements collected at regular intervals. Google Cloud automatically collects platform metrics for every resource — such as CPU utilisation for Compute Engine instances, request count for Cloud Run services, and query latency for BigQuery jobs.

Metric Type Description Examples
Platform metrics Automatically collected by Google Cloud services compute.googleapis.com/instance/cpu/utilization
Custom metrics Defined and sent by your application code custom.googleapis.com/orders/processed
External metrics Ingested from third-party systems via integrations Datadog, Prometheus, or AWS CloudWatch metrics

Monitored Resources

Every metric is associated with a monitored resource — the entity that the metric describes. Examples include gce_instance, cloud_run_revision, gke_container, and cloudsql_database.

Workspaces (Scoping Projects)

Cloud Monitoring uses a scoping project (formerly called a workspace) to define which Google Cloud projects are monitored together. A single scoping project can monitor up to 375 projects, giving you a unified view across your entire organisation.


Metric Naming Convention

Google Cloud metrics follow a hierarchical naming convention:

<service>.googleapis.com/<resource>/<metric_name>

For example:

Metric Description
compute.googleapis.com/instance/cpu/utilization CPU utilisation of a Compute Engine VM
run.googleapis.com/request_count Number of requests to a Cloud Run service
cloudsql.googleapis.com/database/cpu/utilization CPU utilisation of a Cloud SQL instance
loadbalancing.googleapis.com/https/request_count Request count for an HTTPS load balancer

Understanding this naming convention is essential for building queries, dashboards, and alert policies.


Getting Started

Enabling the API

Cloud Monitoring is enabled by default for all Google Cloud projects. To verify or enable it explicitly:

gcloud services enable monitoring.googleapis.com --project=my-project

Viewing Metrics in the Console

  1. Navigate to Monitoring in the Google Cloud Console
  2. Open Metrics Explorer
  3. Select a resource type (e.g., VM Instance)
  4. Choose a metric (e.g., CPU Utilisation)
  5. Apply filters (e.g., zone, instance name)
  6. Adjust the time range and aggregation

Using the gcloud CLI

# List available metric descriptors for Compute Engine
gcloud monitoring metrics-descriptors list \
  --filter='metric.type = starts_with("compute.googleapis.com/instance/cpu")'

# Read time-series data for a specific metric
gcloud monitoring time-series list \
  --filter='metric.type = "compute.googleapis.com/instance/cpu/utilization"' \
  --start-time="2024-01-01T00:00:00Z" \
  --end-time="2024-01-02T00:00:00Z"

Monitoring Strategy

A well-designed monitoring strategy on GCP covers multiple layers:

1. Infrastructure Monitoring

Monitor the health of your Google Cloud resources:

  • Compute Engine CPU, memory, disk, and network
  • GKE cluster node health, pod restarts, and container resource usage
  • Cloud SQL CPU, memory, connections, and replication lag
  • Cloud Storage request rates and error counts

2. Application Monitoring

Monitor application-level behaviour:

  • Request latency and error rates
  • Custom business metrics (orders processed, payments completed)
  • Dependency health (database connections, API call success rates)

3. Network Monitoring

Monitor connectivity and traffic patterns:

  • VPC flow logs for traffic analysis
  • Load balancer latency and error distribution
  • Cloud NAT connection counts and dropped packets

4. Security Monitoring

Monitor for threats and configuration drift:

  • IAM policy changes via audit logs
  • Firewall rule modifications
  • Anomalous API usage patterns

Summary

Google Cloud Monitoring is the foundation of observability on GCP. It automatically collects platform metrics from every resource, supports custom and external metrics, and provides a unified view across multiple projects through scoping projects. Combined with Cloud Logging, Cloud Trace, Cloud Profiler, and Error Reporting, it forms a comprehensive operations suite for maintaining the health, performance, and security of your cloud workloads. In the next lesson, we will dive deep into metrics and dashboards.