You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Amazon CloudWatch is the central monitoring and observability service in AWS. It collects metrics, logs, and events from virtually every AWS resource and allows you to visualise them in dashboards, set alarms, and take automated actions. In this lesson we focus on the two foundational building blocks: metrics and dashboards.
A metric is a time-ordered set of data points that represents a variable you want to monitor — for example, the CPU utilisation of an EC2 instance or the number of messages in an SQS queue.
Every metric is defined by three pieces of information:
| Component | Description | Example |
|---|---|---|
| Namespace | A container that groups related metrics | AWS/EC2, AWS/RDS, Custom/MyApp |
| Metric name | The measurement being tracked | CPUUtilization, RequestCount |
| Dimensions | Key-value pairs that identify the specific resource | InstanceId=i-0abcdef1234567890 |
Standard metrics are published automatically by AWS services at no extra cost. For example, every EC2 instance emits CPUUtilization, NetworkIn, NetworkOut, DiskReadOps, and more — all at five-minute granularity by default.
Custom metrics are metrics you publish yourself using the CloudWatch API, CLI, or an SDK. Common use cases include:
Publishing a custom metric is straightforward with the AWS CLI:
aws cloudwatch put-metric-data \
--namespace "Custom/MyApp" \
--metric-name "OrdersProcessed" \
--value 42 \
--unit Count
| Resolution | Data Point Interval | Cost |
|---|---|---|
| Standard | 5 minutes | Free for AWS-published metrics |
| High resolution | 1 second | Additional charge |
High-resolution metrics are useful for latency-sensitive workloads where five-minute averages hide important spikes. You enable high resolution by setting the StorageResolution parameter to 1 when publishing custom metrics.
When you query a metric, CloudWatch aggregates the raw data points over a period using a statistic:
| Statistic | What It Returns |
|---|---|
| Average | Mean of data points in the period |
| Sum | Total of all data points |
| Minimum | Lowest value in the period |
| Maximum | Highest value in the period |
| SampleCount | Number of data points |
| pN (percentile) | The value below which N % of data points fall, e.g. p99 |
Choosing the right statistic matters. For latency metrics, Average can be misleading because a few very slow requests get hidden by thousands of fast ones. Use p99 or p95 to understand the experience of your slowest users.
CloudWatch retains metric data according to this schedule:
| Data Point Interval | Retained For |
|---|---|
| < 60 seconds (high-resolution) | 3 hours |
| 60 seconds | 15 days |
| 5 minutes | 63 days |
| 1 hour | 455 days (15 months) |
Older data is automatically rolled up into coarser granularity. If you need to keep high-resolution data longer, export it to S3 or a time-series database.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.