Monitoring and DevOps Best Practices

Monitoring and DevOps are deeply interconnected. The best CI/CD pipelines include monitoring at every stage, and the best monitoring strategies are informed by the deployment process. This final lesson brings together everything covered in the course and outlines best practices for building observable, automated Azure workloads.

The DevOps Monitoring Loop

Effective DevOps teams treat monitoring as a core part of the delivery pipeline, not an afterthought:

Plan → Code → Build → Test → Deploy → Monitor → Feedback → Plan

Each stage generates telemetry that feeds back into the next iteration:

Stage	Monitoring Activity
Plan	Review SLO dashboards, error budgets, and incident retrospectives
Code	Static analysis, security scanning, dependency vulnerability checks
Build	Build duration metrics, test coverage trends, artifact size tracking
Test	Test pass/fail rates, performance regression detection
Deploy	Deployment frequency, lead time, change failure rate
Monitor	Application performance, infrastructure health, user experience
Feedback	Incident analysis, user feedback, cost reports

DORA Metrics

The DevOps Research and Assessment (DORA) team identified four key metrics that predict software delivery performance:

Metric	Description	Elite Target
Deployment Frequency	How often code is deployed to production	On demand (multiple deploys per day)
Lead Time for Changes	Time from code commit to production deployment	Less than 1 hour
Change Failure Rate	Percentage of deployments causing a failure	Less than 5%
Time to Restore Service	Time to recover from a failure in production	Less than 1 hour

Measuring DORA Metrics in Azure

Deployment Frequency = Count of production deployments / Time period
Lead Time = Average(deployment timestamp - commit timestamp)
Change Failure Rate = Failed deployments / Total deployments
MTTR = Average(recovery timestamp - incident start timestamp)

Track these metrics by querying Azure DevOps or GitHub APIs and visualising them in Azure Dashboards or Grafana.

Observability-Driven Deployment

Pre-Deployment Checks

Before deploying to production, validate:

All CI tests pass (unit, integration, security scans)
No critical alerts are currently firing in the target environment
Error budget is healthy (not exhausted or near exhaustion)
Dependencies are healthy (database, external APIs, caches)

Deployment Strategies

Strategy	Description	Monitoring Need
Blue-green	Two identical environments; switch traffic instantly	Compare metrics between blue and green
Canary	Route a small percentage of traffic to the new version	Compare canary metrics against baseline
Rolling	Gradually replace instances with the new version	Monitor each instance as it updates
Feature flags	Deploy code but control feature activation separately	Monitor feature flag impact on metrics

Post-Deployment Validation

After every deployment:

Check Live Metrics in Application Insights — are error rates normal?
Review the Application Map — are all dependencies healthy?
Monitor key SLIs for the first 15-30 minutes
Check deployment-specific alerts — did any fire?
Verify synthetic availability tests are passing from all regions

Automated Rollback

Configure automated rollback based on monitoring signals:

# Example: Azure DevOps pipeline with health check gate
- stage: DeployProduction
  jobs:
    - deployment: Deploy
      environment: production
      strategy:
        runOnce:
          deploy:
            steps:
              - task: AzureWebApp@1
                inputs:
                  appName: 'webapp-production'
          postRouteTraffic:
            steps:
              - task: AzureCLI@2
                inputs:
                  scriptType: bash
                  inlineScript: |
                    # Check Application Insights for error rate
                    ERROR_RATE=$(az monitor app-insights query \
                      --app <app-id> \
                      --analytics-query "AppRequests | where TimeGenerated > ago(10m) | summarize ErrorRate = countif(Success == false) * 100.0 / count()" \
                      --query 'tables[0].rows[0][0]' -o tsv)
                    if (( $(echo "$ERROR_RATE > 5" | bc -l) )); then
                      echo "Error rate too high: $ERROR_RATE%. Rolling back."
                      exit 1
                    fi

Infrastructure as Code and Monitoring

Deploy Monitoring with Your Infrastructure

Monitoring configuration should be version-controlled and deployed alongside your infrastructure:

// main.bicep — deploy app + monitoring together
resource appService 'Microsoft.Web/sites@2023-01-01' = {
  name: appName
  location: location
  properties: {
    serverFarmId: appServicePlan.id
  }
}

resource appInsights 'Microsoft.Insights/components@2020-02-02' = {
  name: '${appName}-insights'
  location: location
  kind: 'web'
  properties: {
    Application_Type: 'web'
    WorkspaceResourceId: logAnalyticsWorkspace.id
  }
}

resource diagnosticSettings 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {
  name: '${appName}-diagnostics'
  scope: appService
  properties: {
    workspaceId: logAnalyticsWorkspace.id
    logs: [
      { category: 'AppServiceHTTPLogs', enabled: true }
      { category: 'AppServiceAppLogs', enabled: true }
    ]
    metrics: [
      { category: 'AllMetrics', enabled: true }
    ]
  }
}

resource cpuAlert 'Microsoft.Insights/metricAlerts@2018-03-01' = {
  name: '${appName}-high-cpu'
  location: 'global'
  properties: {
    severity: 2
    scopes: [appService.id]
    evaluationFrequency: 'PT1M'
    windowSize: 'PT5M'
    criteria: {
      'odata.type': 'Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria'
      allOf: [
        {
          name: 'HighCPU'
          metricName: 'CpuPercentage'
          operator: 'GreaterThan'
          threshold: 85
          timeAggregation: 'Average'
        }
      ]
    }
    actions: [{ actionGroupId: actionGroup.id }]
  }
}

Monitoring and DevOps Best Practices

Monitoring and DevOps Best Practices

The DevOps Monitoring Loop

DORA Metrics

Measuring DORA Metrics in Azure

Observability-Driven Deployment

Pre-Deployment Checks

Deployment Strategies

Post-Deployment Validation

Automated Rollback

Infrastructure as Code and Monitoring

Deploy Monitoring with Your Infrastructure

Benefits

More in Cloud