41.3 Distributed Tracing with OpenTelemetry
Right, so you’ve got your services talking to each other. Fantastic. Now, when a request fails or performance goes sideways, you’re left staring at a dozen different logs, trying to play detective across a distributed crime scene. It’s a nightmare. This is why we don’t just build microservices; we make them observable. And the first, most powerful tool in that box is distributed tracing. It’s the single best way to see the life of a request as it bounces around your system, and OpenTelemetry (OTel for short) is the de facto standard for getting it done. It’s the CNCF’s attempt to unify this chaos, and for the most part, it’s succeeding brilliantly.
Let’s be clear: tracing isn’t magic. It works by propagating a unique trace ID and context across all your service boundaries. Every unit of work (a span) gets tagged with this ID. When you send all these spans to a backend like Jaeger, Tempo, or Honeycomb, they can stitch them together into a single, visual story—a trace.
Instrumenting Your First Service
You don’t need to change all your services at once. Start with one. The beauty of OTel is that it’s designed for incremental adoption. Let’s instrument a simple HTTP handler in Go. First, you’ll need the core OTel Go SDK packages.
go get go.opentelemetry.io/otel \
go.opentelemetry.io/otel/trace \
go.opentelemetry.io/otel/sdk \
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
Now, let’s set up the crucial plumbing. This boilerplate is a bit verbose, but you’ll typically do it once in your main.go. Notice we’re using the OTLP gRPC exporter here; it’s the modern, standard protocol.
package main
import (
"context"
"fmt"
"log"
"net/http"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
)
func initTracer() (*sdktrace.TracerProvider, error) {
// Create the OTLP gRPC exporter to send data to a collector (e.g., Jaeger)
exporter, err := otlptracegrpc.New(context.Background())
if err != nil {
return nil, fmt.Errorf("failed to create exporter: %w", err)
}
// Define what service we're in. This is crucial for the backend!
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceName("my-awesome-service"),
semconv.ServiceVersion("v0.1.0"),
)),
)
// Set the global TracerProvider and TextMapPropagator
otel.SetTracerProvider(tp)
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(
propagation.TraceContext{},
propagation.Baggage{},
))
return tp, nil
}
func main() {
tp, err := initTracer()
if err != nil {
log.Fatal(err)
}
defer func() {
if err := tp.Shutdown(context.Background()); err != nil {
log.Fatal(err)
}
}()
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
// We'll add the actual tracing here in a second
fmt.Fprintf(w, "Hello, World!")
})
http.ListenAndServe(":8080", nil)
}
Creating Spans and Propagating Context
The code above sets up the factory, but it doesn’t do anything yet. Now for the good part: creating spans. The critical concept here is context. You must pass the context from the incoming request through to any outbound calls you make. If you drop it on the floor, you break the trace. It’s the distributed systems equivalent of forgetting to cc someone on an email chain.
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
// 1. Extract the trace context from the incoming request headers.
// If this is the very first request, there won't be any, and it starts a new trace.
ctx := otel.GetTextMapPropagator().Extract(r.Context(), propagation.HeaderCarrier(r.Header))
// 2. Start a new span. This automatically becomes a child of any span in the context.
tracer := otel.Tracer("my-tracer")
ctx, span := tracer.Start(ctx, "my-awesome-handler")
defer span.End() // Crucial: you MUST end the span, or you'll leak memory.
// Your actual business logic here...
fmt.Fprintf(w, "Hello, World!")
// 3. Let's say you call another service. You MUST inject the context into the outgoing request.
clientReq, _ := http.NewRequestWithContext(ctx, "GET", "http://other-service/api", nil)
otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(clientReq.Header))
// Then execute clientReq with an HTTP client...
})
The Gotchas: What They Don’t Tell You
You will mess this up. Everyone does. The most common pitfall is context loss. You start a span but then fire off a goroutine without passing the context. The new goroutine gets a context.Background(), and suddenly your beautiful trace has a weird, detached branch. Always pass the context explicitly.
Another classic: over-instrumentation. Do you really need a span for every single database field fetch? Probably not. You’ll drown in noise. Instrument key operations: HTTP handlers, RPC calls, major queue consumers, and complex logic blocks.
Sampling is your friend at scale. Sending every single trace from a high-throughput service will murder your backend. The OTel SDK lets you configure head-based sampling (decide at the start of the trace) or tail-based (decide at the end). Start with always-on for dev, and move to a probabilistic sampler (sample 10% of traces) in production. You get most of the insight for a fraction of the cost.
Finally, attributes are your best weapon for debugging. Adding span.SetAttributes(label.String("user.id", userID)) turns a generic database span into a specific one you can actually use to find that one user whose requests are always timing out. It’s the difference between knowing “a query was slow” and knowing “the query for user#4815 was slow.” Be generous with attributes, but don’t put PII in there, for heaven’s sake. Your compliance officer will thank me.