Alerting Policies

Monitoring is only valuable if it leads to action. Alerting policies in Google Cloud Monitoring automatically evaluate metric conditions and notify the right people when thresholds are breached. A well-designed alerting strategy reduces mean time to detection (MTTD), routes incidents to the correct teams, and avoids alert fatigue from noisy, low-value notifications.

How Alerting Works

An alerting policy consists of three components:

Component	Description
Conditions	The metric-based rules that define when an alert should fire
Notification channels	Where to send the alert (email, Slack, PagerDuty, SMS, etc.)
Documentation	Context and runbook information attached to the alert

Alert Lifecycle

Metric data collected
        |
Condition evaluated (every alignment period)
        |
Threshold breached for duration window
        |
Incident opened --> Notification sent
        |
Condition no longer met
        |
Incident closed --> Resolution notification sent

Condition Types

Metric Threshold Conditions

The most common condition type. It fires when a metric crosses a threshold for a specified duration:

# Create an alert when CPU > 80% for 5 minutes
gcloud monitoring policies create \
  --display-name="High CPU Alert" \
  --condition-display-name="CPU > 80%" \
  --condition-filter='metric.type = "compute.googleapis.com/instance/cpu/utilization" AND resource.type = "gce_instance"' \
  --condition-threshold-value=0.8 \
  --condition-threshold-comparison=COMPARISON_GT \
  --condition-threshold-duration=300s \
  --condition-threshold-aggregation='{"alignmentPeriod": "60s", "perSeriesAligner": "ALIGN_MEAN"}' \
  --notification-channels="projects/my-project/notificationChannels/12345"

Metric Absence Conditions

Fires when expected metric data stops arriving — useful for detecting crashed processes or broken pipelines:

Parameter	Description
Duration	How long the metric must be absent before alerting
Filter	Which time series to watch for absence

Forecast Conditions

Uses linear regression to predict when a metric will breach a threshold in the future — ideal for capacity planning:

Use Case	Metric	Forecast Window
Disk full	`compute.googleapis.com/instance/disk/percent_used`	24 hours
Certificate expiry	Custom metric for days until expiry	30 days
Budget exhaustion	Billing metric for project spend	7 days

Notification Channels

Notification channels define where alerts are delivered. Cloud Monitoring supports many channel types:

Channel Type	Use Case
Email	General notifications, low-urgency alerts
Slack	Team-wide visibility, real-time collaboration
PagerDuty	On-call escalation for critical incidents
SMS	Urgent alerts when team members may not be online
Pub/Sub	Programmatic handling — trigger Cloud Functions, log to BigQuery
Webhooks	Integration with custom systems or third-party tools
Mobile push	Google Cloud Console mobile app notifications

Creating a Notification Channel

# Create a Slack notification channel
gcloud monitoring channels create \
  --display-name="SRE Slack Channel" \
  --type=slack \
  --channel-labels=channel_name="#sre-alerts"

Channel Verification

Email and SMS channels require verification before they can receive notifications. Unverified channels are skipped when alerts fire — always verify channels immediately after creation.

Multi-Condition Policies

Alerting policies can contain multiple conditions combined with AND or OR logic:

Combiner	Behaviour
AND	All conditions must be true simultaneously — reduces false positives
OR	Any condition being true triggers the alert — casts a wider net

Example: Combined Resource Saturation

An AND policy that fires only when both CPU and memory are high:

Condition 1: CPU utilisation > 85% for 5 minutes
Condition 2: Memory utilisation > 90% for 5 minutes
Combiner: AND

Alerting Policies

Alerting Policies

How Alerting Works

Alert Lifecycle

Condition Types

Metric Threshold Conditions

Metric Absence Conditions

Forecast Conditions

Notification Channels

Creating a Notification Channel

Channel Verification

Multi-Condition Policies

Example: Combined Resource Saturation

More in Cloud