You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
The Operational Excellence pillar focuses on running and monitoring systems to deliver business value and continually improving supporting processes and procedures. It is about how you manage your workloads, respond to events, and learn from experience to drive continuous improvement.
The Operational Excellence pillar is guided by five design principles:
Define your entire workload — infrastructure, configuration, and operational procedures — as code. This eliminates human error, enables version control, and allows you to trigger operations in response to events automatically.
Design workloads to allow components to be updated regularly in small increments. Small changes are easier to understand, test, and roll back if something goes wrong. This reduces the blast radius of any single change.
As you evolve your workload, evolve your procedures alongside it. Set regular reviews to identify which procedures are effective and which need updating. Procedures that are never reviewed become stale and unreliable.
Perform pre-mortem exercises to identify potential sources of failure so you can remove or mitigate them. Test your failure scenarios and validate your understanding of their impact. This builds resilience before incidents occur.
Drive improvement through lessons learned from all operational events and failures. Share what is learned across teams and throughout the organisation. Create a culture where failure is seen as an opportunity to improve.
The Operational Excellence pillar organises its guidance into four best practice areas:
Your teams need a shared understanding of the entire workload, their role in it, and shared business goals. This includes:
Effective preparation is essential for operational excellence. Key practices include:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.