Distributed Tracing

Distributed tracing follows a single request as it travels through multiple services in a distributed system. It reveals the path, timing, and dependencies of each operation, making it indispensable for debugging latency issues and understanding service interactions.

Why Tracing Matters

In a monolithic application, a stack trace shows you exactly where an error occurred. In a microservices architecture, a single user request might traverse 10 or more services:

User → API Gateway → Auth Service → Order Service → Payment Service → Notification Service
                                          ↓
                                   Inventory Service → Database

Without tracing, answering "why was this request slow?" requires correlating logs from every service manually. Tracing solves this by connecting all operations into a single, visual timeline.

Core Concepts

Trace

A trace represents the entire journey of a request through the system. It is identified by a unique trace ID.

Span

A span represents a single operation within a trace — an HTTP request, a database query, a function call. Each span has:

Field	Description
Trace ID	Shared across all spans in the trace
Span ID	Unique identifier for this span
Parent Span ID	The span that initiated this operation
Operation name	What this span represents (e.g., `GET /api/orders`)
Start time	When the operation began
Duration	How long the operation took
Tags / Attributes	Key-value metadata (e.g., `http.status_code=200`)
Events / Logs	Timestamped annotations within the span
Status	OK, ERROR, or UNSET

Context Propagation

For tracing to work across services, the trace context (trace ID, span ID) must be propagated between services — typically via HTTP headers:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

This uses the W3C Trace Context standard, which is now widely adopted.

OpenTelemetry

OpenTelemetry (OTel) is the CNCF standard for generating and collecting telemetry data. It provides:

Component	Purpose
API	Vendor-neutral interfaces for instrumentation
SDK	Implementation with processing, batching, and export
Collector	A standalone service for receiving, processing, and exporting telemetry
Auto-instrumentation	Automatic tracing of common libraries (HTTP, gRPC, databases)

Instrumenting with OpenTelemetry

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Set up the tracer
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="otel-collector:4317"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

Distributed Tracing

Distributed Tracing

Why Tracing Matters

Core Concepts

Trace

Span

Context Propagation

OpenTelemetry

Instrumenting with OpenTelemetry

More in DevOps