You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Monitoring is only valuable if it leads to action. Alerting policies in Google Cloud Monitoring automatically evaluate metric conditions and notify the right people when thresholds are breached. A well-designed alerting strategy reduces mean time to detection (MTTD), routes incidents to the correct teams, and avoids alert fatigue from noisy, low-value notifications.
An alerting policy consists of three components:
| Component | Description |
|---|---|
| Conditions | The metric-based rules that define when an alert should fire |
| Notification channels | Where to send the alert (email, Slack, PagerDuty, SMS, etc.) |
| Documentation | Context and runbook information attached to the alert |
Metric data collected
|
Condition evaluated (every alignment period)
|
Threshold breached for duration window
|
Incident opened --> Notification sent
|
Condition no longer met
|
Incident closed --> Resolution notification sent
The most common condition type. It fires when a metric crosses a threshold for a specified duration:
# Create an alert when CPU > 80% for 5 minutes
gcloud monitoring policies create \
--display-name="High CPU Alert" \
--condition-display-name="CPU > 80%" \
--condition-filter='metric.type = "compute.googleapis.com/instance/cpu/utilization" AND resource.type = "gce_instance"' \
--condition-threshold-value=0.8 \
--condition-threshold-comparison=COMPARISON_GT \
--condition-threshold-duration=300s \
--condition-threshold-aggregation='{"alignmentPeriod": "60s", "perSeriesAligner": "ALIGN_MEAN"}' \
--notification-channels="projects/my-project/notificationChannels/12345"
Fires when expected metric data stops arriving — useful for detecting crashed processes or broken pipelines:
| Parameter | Description |
|---|---|
| Duration | How long the metric must be absent before alerting |
| Filter | Which time series to watch for absence |
Uses linear regression to predict when a metric will breach a threshold in the future — ideal for capacity planning:
| Use Case | Metric | Forecast Window |
|---|---|---|
| Disk full | compute.googleapis.com/instance/disk/percent_used | 24 hours |
| Certificate expiry | Custom metric for days until expiry | 30 days |
| Budget exhaustion | Billing metric for project spend | 7 days |
Notification channels define where alerts are delivered. Cloud Monitoring supports many channel types:
| Channel Type | Use Case |
|---|---|
| General notifications, low-urgency alerts | |
| Slack | Team-wide visibility, real-time collaboration |
| PagerDuty | On-call escalation for critical incidents |
| SMS | Urgent alerts when team members may not be online |
| Pub/Sub | Programmatic handling — trigger Cloud Functions, log to BigQuery |
| Webhooks | Integration with custom systems or third-party tools |
| Mobile push | Google Cloud Console mobile app notifications |
# Create a Slack notification channel
gcloud monitoring channels create \
--display-name="SRE Slack Channel" \
--type=slack \
--channel-labels=channel_name="#sre-alerts"
Email and SMS channels require verification before they can receive notifications. Unverified channels are skipped when alerts fire — always verify channels immediately after creation.
Alerting policies can contain multiple conditions combined with AND or OR logic:
| Combiner | Behaviour |
|---|---|
| AND | All conditions must be true simultaneously — reduces false positives |
| OR | Any condition being true triggers the alert — casts a wider net |
An AND policy that fires only when both CPU and memory are high:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.