You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
A CI/CD pipeline does not end at deployment. Monitoring production, tracking pipeline health, and fostering a CI/CD culture are essential for long-term success. This lesson covers observability, pipeline metrics, and best practices for mature CI/CD adoption.
Deploying code is only half the job. You need to know if the deployment is working correctly in production.
Google's SRE team defines four key metrics for monitoring any system:
| Signal | What It Measures | Example |
|---|---|---|
| Latency | How long requests take | p50: 50ms, p99: 200ms |
| Traffic | How much demand is on the system | 1,000 requests/sec |
| Errors | Rate of failed requests | 0.1% error rate |
| Saturation | How full the system is | CPU at 70%, memory at 85% |
| Tool | Purpose |
|---|---|
| Prometheus | Metrics collection and alerting |
| Grafana | Dashboards and visualisation |
| Datadog | All-in-one monitoring SaaS |
| New Relic | Application performance monitoring |
| PagerDuty / Opsgenie | Incident alerting and on-call management |
| Sentry | Error tracking and crash reporting |
| OpenTelemetry | Vendor-neutral telemetry collection |
After every deployment, verify that the new version is healthy:
# Run smoke tests after deployment
deploy:
steps:
- name: Deploy to production
run: ./deploy.sh
- name: Smoke test
run: |
# Wait for deployment to stabilise
sleep 30
# Check health endpoint
STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://myapp.example.com/health)
if [ "$STATUS" != "200" ]; then
echo "Health check failed with status $STATUS"
exit 1
fi
# Check critical endpoint
STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://myapp.example.com/api/status)
if [ "$STATUS" != "200" ]; then
echo "API status check failed"
exit 1
fi
echo "All smoke tests passed"
If post-deployment checks fail, automatically roll back:
- name: Rollback on failure
if: failure()
run: |
echo "Deployment verification failed — rolling back"
kubectl rollout undo deployment/myapp
Track the health of your CI/CD pipeline itself:
| Metric | What It Measures | Target |
|---|---|---|
| Build duration | Time from push to artefact | < 5 minutes |
| Deploy frequency | How often you deploy | Multiple times per day |
| Lead time for changes | Commit to production | < 1 hour |
| Change failure rate | Deploys that cause incidents | < 5% |
| Mean time to recover (MTTR) | Time to fix a production failure | < 1 hour |
| Pipeline success rate | Percentage of passing builds | > 95% |
| Flaky test rate | Tests that pass/fail intermittently | < 1% |
The DORA (DevOps Research and Assessment) team identified four key metrics that correlate with high-performing teams:
| DORA Metric | Elite | High | Medium | Low |
|---|---|---|---|---|
| Deploy frequency | On-demand (multiple/day) | Weekly-monthly | Monthly-6 monthly | 6 months+ |
| Lead time for changes | < 1 hour | 1 day-1 week | 1-6 months | 6 months+ |
| Change failure rate | 0-15% | 16-30% | 16-30% | 46-60% |
| Time to restore | < 1 hour | < 1 day | 1 day-1 week | 6 months+ |
Target: < 5 minutes for the feedback loop
Strategies:
├── Cache dependencies (node_modules, pip, Maven)
├── Run tests in parallel
├── Use incremental builds
├── Split large test suites
└── Use faster runners (larger machines)
Order your pipeline stages from fastest to slowest:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.