You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
The Reliability pillar focuses on ensuring that a workload performs its intended function correctly and consistently when it is expected to. This includes the ability to operate and test the workload through its total lifecycle, detect failures, and automatically recover.
The Reliability pillar is guided by five design principles:
Monitor key performance indicators (KPIs) of your workload and trigger automated recovery when a threshold is breached. This allows you to detect and fix failures before they affect your users.
In the cloud, you can test how your workload fails and validate your recovery procedures. Use automation to simulate different failures or recreate scenarios that led to failures before. This exposes failure pathways that you can test and fix before a real failure scenario occurs.
Replace one large resource with multiple small resources to reduce the impact of a single point of failure on the overall workload. Distribute requests across multiple, smaller resources to ensure that they don't share a common point of failure.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.