You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Metrics alone only give you visibility. Alarms add the crucial ability to detect when something goes wrong and respond automatically. A CloudWatch Alarm watches a single metric (or a metric math expression) over a period of time and changes state when the value crosses a threshold you define.
Every CloudWatch Alarm is always in exactly one of three states:
| State | Meaning |
|---|---|
| OK | The metric is within the acceptable range |
| ALARM | The metric has breached the threshold |
| INSUFFICIENT_DATA | Not enough data points to evaluate the alarm |
When you first create an alarm, it starts in INSUFFICIENT_DATA until enough data points arrive to make a determination.
An alarm is defined by five key components:
Suppose you want to be alerted when an EC2 instance's CPU stays above 80 % for 10 minutes. The configuration would be:
| Parameter | Value |
|---|---|
| Namespace | AWS/EC2 |
| Metric name | CPUUtilization |
| Dimension | InstanceId = i-0abcdef1234567890 |
| Statistic | Average |
| Period | 300 seconds (5 minutes) |
| Evaluation periods | 2 |
| Threshold | > 80 |
The alarm enters the ALARM state only after two consecutive 5-minute periods average above 80 %. This prevents a brief spike from triggering a false alert.
For noisier metrics, you can use the "M of N" evaluation model. Instead of requiring every evaluation period to breach, you can say "3 out of 5 periods must breach." This further reduces false positives:
Evaluation periods: 5
Datapoints to alarm: 3
If 3 of the last 5 periods exceed the threshold, the alarm fires.
The real power of alarms is in actions — what happens when the state changes. You can attach actions to three transitions:
| Transition | Typical Use |
|---|---|
| OK → ALARM | Notify on-call, scale out, trigger remediation |
| ALARM → OK | Send "all clear" notification |
| Any → INSUFFICIENT_DATA | Investigate missing data |
The most common action. The alarm publishes a message to an SNS topic, which can fan out to:
Trigger a scale-out or scale-in policy. For example:
This is the foundation of reactive auto scaling on AWS.
Directly act on an EC2 instance:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.