You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
You cannot manage what you cannot see. Observability in Kubernetes covers three pillars: metrics, logs, and traces. This lesson covers the Prometheus/Grafana monitoring stack, logging with Fluentd and Loki, distributed tracing with Jaeger, and alerting strategies.
| Pillar | What It Answers | Tools |
|---|---|---|
| Metrics | How is the system performing? | Prometheus, Grafana, metrics-server |
| Logs | What happened and why? | Fluentd, Loki, Elasticsearch |
| Traces | How does a request flow through services? | Jaeger, Zipkin, Tempo |
┌─────────────────────────────────────────────────────────────┐
│ Grafana │
│ (Unified Dashboard) │
└──────┬──────────────────┬──────────────────┬────────────────┘
│ │ │
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Prometheus │ │ Loki │ │ Jaeger │
│ (Metrics) │ │ (Logs) │ │ (Traces) │
└──────┬─────┘ └──────┬─────┘ └──────┬─────┘
│ │ │
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Exporters │ │ Fluentd / │ │ OpenTelemetry│
│ /metrics │ │ Promtail │ │ SDK │
└────────────┘ └────────────┘ └────────────┘
Prometheus is the de-facto standard for Kubernetes metrics.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
-f monitoring-values.yaml
# CPU usage per pod
rate(container_cpu_usage_seconds_total{namespace="production"}[5m])
# Memory usage as a percentage
container_memory_working_set_bytes{namespace="production"}
/ on(pod) kube_pod_container_resource_limits{resource="memory"} * 100
# Request rate per service
rate(http_requests_total{namespace="production"}[5m])
# 99th percentile latency
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
# Error rate
rate(http_requests_total{status=~"5.."}[5m])
/ rate(http_requests_total[5m]) * 100
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: web-api-monitor
namespace: monitoring
spec:
selector:
matchLabels:
app: web-api
namespaceSelector:
matchNames:
- production
endpoints:
- port: metrics
interval: 15s
path: /metrics
Grafana visualises metrics from Prometheus (and other data sources).
| Dashboard | What It Shows |
|---|---|
| Kubernetes Cluster Overview | Node CPU, memory, pod counts |
| Namespace Resources | Per-namespace resource usage |
| Pod Details | Individual pod metrics |
| Node Exporter | Host-level metrics (disk, network) |
| CoreDNS | DNS query rates, latency, errors |
{
"title": "Request Rate by Service",
"type": "timeseries",
"datasource": "Prometheus",
"targets": [
{
"expr": "sum(rate(http_requests_total{namespace=\"production\"}[5m])) by (service)",
"legendFormat": "{{ service }}"
}
]
}
metrics-server provides real-time CPU and memory metrics for pods and nodes — used by HPA and kubectl top.
# Install metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.