You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Introduction to AWS Monitoring
Introduction to AWS Monitoring
Monitoring is the practice of collecting, analysing, and acting on data about your systems so that you can detect problems before they affect users, understand how your infrastructure behaves under load, and make informed decisions about capacity and cost. On AWS, monitoring is not an afterthought — it is a first-class concern baked into every service.
Why Monitoring Matters
Imagine you deploy a web application on a fleet of EC2 instances behind an Application Load Balancer. Traffic is light at first, but a marketing campaign drives a sudden spike. Without monitoring you might not notice that:
- CPU utilisation has hit 98 % on every instance
- Response latency has jumped from 200 ms to 3 seconds
- Error rates have climbed because the database connection pool is exhausted
By the time customers complain, you have already lost revenue and trust. Monitoring closes the feedback loop between what your infrastructure is doing and what you think it is doing.
The Four Golden Signals
Google's Site Reliability Engineering book popularised four golden signals that apply just as well to AWS workloads:
| Signal | What It Measures | AWS Example |
|---|---|---|
| Latency | Time to serve a request | ALB target response time |
| Traffic | Demand on the system | Requests per second to API Gateway |
| Errors | Rate of failed requests | HTTP 5xx count on CloudFront |
| Saturation | How "full" a resource is | RDS CPU utilisation at 90 % |
If you instrument these four signals for every workload, you will catch the vast majority of production issues.
The Monitoring Spectrum
Monitoring on AWS is not a single tool — it is a spectrum of complementary capabilities:
Metrics
Numeric time-series data points. Examples: CPU utilisation, request count, queue depth. Amazon CloudWatch is the central metrics service on AWS. Every AWS service publishes metrics to CloudWatch automatically.
Logs
Detailed textual records of events. Examples: application log lines, VPC Flow Logs, Lambda invocation logs. CloudWatch Logs is the managed log aggregation and query service.
Traces
End-to-end request paths through distributed systems. AWS X-Ray captures traces so you can see how a request flows from API Gateway through Lambda to DynamoDB and back.
Events and Alarms
Real-time notifications when something changes or crosses a threshold. CloudWatch Alarms trigger SNS topics, Auto Scaling actions, or Lambda functions when a metric breaches a limit.
Auditing
A record of who did what and when. AWS CloudTrail logs every API call made against your account, giving you an audit trail for security and compliance.
Key AWS Monitoring Services
| Service | Primary Purpose | Key Feature |
|---|---|---|
| Amazon CloudWatch | Metrics, logs, alarms, dashboards | Unified operational data from 70+ AWS services |
| AWS CloudTrail | API activity auditing | Records every API call for governance and compliance |
| AWS X-Ray | Distributed tracing | Visualises request paths across microservices |
| Amazon EventBridge | Event-driven automation | Routes events from AWS services, SaaS, and custom apps |
| AWS Config | Resource configuration tracking | Continuous evaluation of resource compliance |
| VPC Flow Logs | Network traffic logging | Captures IP traffic metadata for analysis |
You do not need to master every service on day one. In this course we will focus on CloudWatch (metrics, logs, alarms, dashboards), CloudTrail, and X-Ray because they form the monitoring backbone of almost every AWS workload.
Observability vs Monitoring
You will often hear the term "observability" alongside monitoring. The distinction is subtle but important:
- Monitoring answers known questions: "Is CPU above 80 %?" or "Are there any 5xx errors?"
- Observability lets you ask new questions you did not anticipate: "Why is this specific user's request slow even though aggregate latency looks normal?"
Observability requires rich, high-cardinality data — detailed logs, distributed traces, and custom metrics. AWS provides the building blocks; your job is to instrument your applications to emit the right data.
The Three Pillars of Observability
- Metrics — aggregated numeric measurements (CloudWatch Metrics)
- Logs — discrete event records (CloudWatch Logs)
- Traces — end-to-end request journeys (AWS X-Ray)
When you combine all three pillars you can move from reactive firefighting to proactive, data-driven operations.
The Shared Responsibility Model for Monitoring
AWS publishes platform-level metrics for managed services automatically. For example, RDS exposes CPU utilisation, free storage space, and read/write IOPS without any configuration. However, AWS cannot see inside your application. You are responsible for:
- Application-level metrics — request latency, business transactions per second, cache hit ratio
- Custom log formats — structured JSON logs with correlation IDs
- Trace instrumentation — adding the X-Ray SDK to your code
Think of it as two layers:
| Layer | Responsibility | Example |
|---|---|---|
| Infrastructure | AWS provides built-in metrics | EC2 CPUUtilization, RDS FreeableMemory |
| Application | You instrument your code | Order processing time, payment success rate |
Cost Awareness
Monitoring has a cost. CloudWatch charges per metric, per alarm, per GB of log data ingested, and per query run. Before you instrument everything, plan a monitoring strategy that balances visibility against expense:
- Use standard (free) metrics where possible — AWS provides many at no extra charge.
- Aggregate before you ship — send percentiles and averages rather than every raw data point.
- Set log retention policies — do not keep debug logs forever.
- Use metric filters instead of querying raw logs for simple counts.
We will cover cost-effective monitoring patterns throughout this course.
Summary
Monitoring is the foundation of reliable cloud operations. AWS provides a rich suite of services — CloudWatch for metrics, logs, and alarms; CloudTrail for API auditing; X-Ray for distributed tracing — that together give you deep visibility into your workloads. Understanding the four golden signals, the three pillars of observability, and the shared responsibility model for monitoring will prepare you for the hands-on lessons that follow.
In the next lesson we will dive into Amazon CloudWatch Metrics and Dashboards — the starting point for every AWS monitoring journey.