You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
In this final lesson, we bring together everything covered in the course to design and build a complete observability stack. We will walk through architecture decisions, tool selection, implementation, and the organisational practices that make observability effective.
A production observability stack has four layers:
┌──────────────────────────────────────────────────────────┐
│ Visualisation Layer │
│ Grafana (dashboards, explore, alerts) │
└──────────────────────┬───────────────────────────────────┘
│
┌──────────────────────┼───────────────────────────────────┐
│ Storage Layer │
│ Prometheus/Mimir │ Loki │ Tempo │
│ (metrics) │ (logs) │ (traces) │
└──────────────────────┬───────────────────────────────────┘
│
┌──────────────────────┼───────────────────────────────────┐
│ Collection Layer │
│ OTel Collector │ Promtail/Fluent Bit │ Prometheus │
└──────────────────────┬───────────────────────────────────┘
│
┌──────────────────────┼───────────────────────────────────┐
│ Instrumentation Layer │
│ Application code with OTel SDK, client libraries │
└──────────────────────────────────────────────────────────┘
| Layer | Tool | Purpose |
|---|---|---|
| Metrics | Prometheus + Mimir | Collection + long-term storage |
| Logs | Loki + Promtail | Aggregation + collection |
| Traces | Tempo + OTel Collector | Storage + collection |
| Visualisation | Grafana | Dashboards, alerting, exploration |
| Alerting | Alertmanager | Routing and notification |
| Vendor | Strengths |
|---|---|
| Datadog | All-in-one platform, excellent UX, strong APM |
| New Relic | Generous free tier, full-stack observability |
| Dynatrace | AI-powered root cause analysis, enterprise focus |
| Splunk | Industry-leading log search, compliance features |
| Grafana Cloud | Managed LGTM stack with a free tier |
| Factor | Open Source | Commercial |
|---|---|---|
| Cost | Infrastructure cost only | Per-host or per-GB pricing |
| Control | Full control over data and configuration | Vendor manages infrastructure |
| Complexity | You operate the stack | Vendor operates the stack |
| Features | Community-driven, requires integration | Integrated, polished UX |
| Compliance | Data stays in your infrastructure | Data in vendor's cloud |
A complete local observability stack:
version: '3.8'
services:
# Metrics
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- ./prometheus/rules/:/etc/prometheus/rules/
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=15d'
- '--web.enable-lifecycle'
# Logs
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
volumes:
- ./loki/loki.yml:/etc/loki/local-config.yaml
promtail:
image: grafana/promtail:latest
volumes:
- /var/log:/var/log
- ./promtail/promtail.yml:/etc/promtail/config.yml
# Traces
tempo:
image: grafana/tempo:latest
ports:
- "3200:3200"
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
volumes:
- ./tempo/tempo.yml:/etc/tempo/tempo.yml
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
ports:
- "4327:4317" # OTLP gRPC (apps send here)
- "4328:4318" # OTLP HTTP
volumes:
- ./otel/otel-collector.yml:/etc/otelcol-contrib/config.yaml
# Alerting
alertmanager:
image: prom/alertmanager:latest
ports:
- "9093:9093"
volumes:
- ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
# Visualisation
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- ./grafana/provisioning:/etc/grafana/provisioning
- grafana-data:/var/lib/grafana
volumes:
grafana-data:
Auto-configure data sources on startup:
# grafana/provisioning/datasources/datasources.yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
- name: Loki
type: loki
access: proxy
url: http://loki:3100
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: "trace_id=(\\w+)"
name: TraceID
url: "$$${__value.raw}"
- name: Tempo
type: tempo
access: proxy
url: http://tempo:3200
uid: tempo
jsonData:
tracesToLogs:
datasourceUid: loki
filterByTraceID: true
serviceMap:
datasourceUid: prometheus
This configuration enables cross-pillar correlation — jump from traces to logs, from logs to traces, and from metrics to traces.
When adding observability to an application, follow this checklist:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.