You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
The AWS Well-Architected Framework is a set of best practices and guidelines developed by AWS Solutions Architects. It provides a consistent approach to evaluating architectures and implementing designs that will scale over time. Whether you are building a startup MVP or a global enterprise platform, the framework helps you make informed design decisions.
The framework is built around six pillars, each representing a fundamental area of cloud architecture:
┌─────────────────────────────────────────────────────────┐
│ AWS Well-Architected Framework │
├───────────┬───────────┬───────────┬─────────────────────┤
│Operational│ Security │Reliability│ Performance │
│Excellence │ │ │ Efficiency │
├───────────┼───────────┼───────────┼─────────────────────┤
│ Cost │Sustain- │ │ │
│Optimisation│ability │ │ │
└───────────┴───────────┴───────────┴─────────────────────┘
Operational Excellence focuses on running and monitoring systems to deliver business value, and continually improving supporting processes and procedures.
| Principle | Description |
|---|---|
| Perform operations as code | Define your infrastructure and operations as code (CloudFormation, Terraform, CDK) |
| Make frequent, small, reversible changes | Small deployments reduce risk and make rollbacks easier |
| Refine operations procedures frequently | Regularly update runbooks and playbooks |
| Anticipate failure | Run game days, chaos engineering, and failure injection tests |
| Learn from operational events | Conduct post-incident reviews and share learnings |
| Service | How It Helps |
|---|---|
| CloudFormation / CDK | Infrastructure as Code |
| AWS Config | Track configuration changes |
| CloudWatch | Monitoring and alerting |
| Systems Manager | Automate operational tasks |
| X-Ray | Distributed tracing |
Security focuses on protecting information, systems, and assets while delivering business value through risk assessments and mitigation strategies.
| Principle | Description |
|---|---|
| Implement a strong identity foundation | Least privilege, MFA, centralised identity management |
| Enable traceability | Monitor and audit all actions and changes |
| Apply security at all layers | Network, OS, application, data — not just the perimeter |
| Automate security best practices | Use automated tools to detect and respond to threats |
| Protect data in transit and at rest | Encrypt everything, manage keys properly |
| Keep people away from data | Reduce manual access and use automation |
| Prepare for security events | Have an incident response plan and practice it |
| Service | How It Helps |
|---|---|
| IAM | Identity and access management |
| KMS | Encryption key management |
| CloudTrail | API call auditing |
| GuardDuty | Threat detection |
| Security Hub | Centralised security findings |
| WAF & Shield | Application and DDoS protection |
| Macie | Sensitive data discovery |
Reliability focuses on the ability of a system to recover from failures, dynamically acquire resources to meet demand, and mitigate disruptions.
| Principle | Description |
|---|---|
| Automatically recover from failure | Use health checks, auto-scaling, and multi-AZ deployments |
| Test recovery procedures | Regularly simulate failures to validate your recovery processes |
| Scale horizontally | Distribute load across multiple small resources instead of one large one |
| Stop guessing capacity | Use auto-scaling to match demand automatically |
| Manage change in automation | Use Infrastructure as Code to make changes predictable |
| Service | How It Helps |
|---|---|
| ELB | Distribute traffic across healthy targets |
| Auto Scaling | Scale capacity up and down automatically |
| RDS Multi-AZ | Automatic database failover |
| S3 | 99.999999999% (11 9s) durability |
| Route 53 | DNS health checks and failover routing |
| Backup | Centralised backup management |
Availability targets:
99% = 3.65 days downtime/year ("two nines")
99.9% = 8.77 hours downtime/year ("three nines")
99.99% = 52.6 minutes downtime/year ("four nines")
99.999% = 5.26 minutes downtime/year ("five nines")
Performance Efficiency focuses on using computing resources efficiently and maintaining that efficiency as demand changes and technologies evolve.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.