28.5 Jaeger: Open-Source Distributed Tracing Backend

Right, so you’ve got OpenTelemetry instrumenting your code and sending out all these lovely spans. Fantastic. But that telemetry data has to go somewhere unless you’re just shouting into the void, which is a terrible architectural pattern. This is where Jaeger comes in. Think of it as your dedicated, high-performance storage and analysis garage for your trace data. It’s open-source, it’s a CNCF graduate (so you know it’s not just some fly-by-night project), and it’s probably the most common backend you’ll hook up to your OpenTelemetry SDK.

Now, Jaeger has this slightly… modular architecture. You can run it as an all-in-one binary for kicking the tires, which is what we’ll do here, or you can break its components out into separate microservices for a production deployment. The main players are the Agent (which receives traces and forwards them), the Collector (which does the actual processing and writing to storage), the Query service (which, you know, queries traces), and the UI (the pretty frontend).

Running Jaeger All-in-One for Local Development

For anything resembling local development or a quick demo, the all-in-one image is your best friend. It bundles everything up, includes a memory-backed storage (so everything vanishes when you stop it – perfect!), and fires up the UI. Let’s get it running with Docker:

docker run -d --name jaeger \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 16686:16686 \
  jaegertracing/all-in-one:latest

Boom. Done. The 6831 port is for receiving traces via the Jaeger Thrift protocol over UDP, which is what a lot of older clients used. The one you care about now is port 16686 – that’s the UI. Point your browser at http://localhost:16686 and you’re in business.

But wait, we’re using OpenTelemetry, not the Jaeger client directly. This is where the OTLP protocol comes in. The all-in-one image also listens for OTLP over gRPC on port 4317. This is the modern, vendor-neutral way to ship traces. So let’s configure our OTLP exporter to point here.

Configuring the OTLP Exporter to Talk to Jaeger

Here’s how you’d set up a Node.js tracer provider to shoot traces straight to your local Jaeger instance. The key is pointing the OTLP exporter at the right endpoint.

const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { SimpleSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');

// Create a provider and activate it
const provider = new NodeTracerProvider();
provider.register();

// Set up the OTLP exporter pointing to Jaeger's gRPC port
const exporter = new OTLPTraceExporter({
  url: 'http://localhost:4317', // Jaeger all-in-one OTLP gRPC endpoint
});

// Use the SimpleSpanProcessor for immediate, one-by-one sending.
// Perfect for development. For prod, you'd likely use BatchSpanProcessor.
provider.addSpanProcessor(new SimpleSpanProcessor(exporter));

console.log('Tracing initialized!');

Why SimpleSpanProcessor? Because in development, you want to see your traces the instant a request finishes, not wait for some batch to fill up. It’s horrible for production throughput, but brilliant for debugging. Speaking of production, let’s talk about that.

The Production Reality: Jaeger Collector and Scaling

The all-in-one mode is a toy. A wonderful, useful toy, but a toy nonetheless. For a real deployment, you don’t want your application nodes talking directly to storage. You deploy the Jaeger Collector as a separate component. Your apps send traces (via OTLP) to the Collector, and the Collector then handles the grueling work of writing them to a real persistent backend like Elasticsearch, Cassandra, or Jaeger’s own badgerDB if you’re a masochist.

This is a critical design. It buffers writes, manages load, and lets you scale your storage tier independently from your application tier. The Agent/Collector model is one of those things that seems like overkill until the first time your trace storage has a hiccup and you realize it didn’t take down your entire order processing pipeline with it.

Querying Traces: The Real Payoff

The whole point of this exercise is to find stuff. The Jaeger UI is how you do it. You can search by service, operation, tags, and duration. The timeline view is the star of the show. You’ll see a waterfall diagram of your trace, making it painfully obvious which microservice decided to take a three-second nap in the middle of your user’s login request.

The magic happens when you start using attributes (tags) effectively. Adding a tag like user.id="12345" or http.route="/api/orders/:id" means you can later find all traces for that user or that specific route. This is infinitely more useful than just sifting through thousands of traces by service name. It turns your tracing system from a passive recording into an active debugging tool.

The rough edge? The Jaeger UI’s query interface can feel a bit clunky compared to some commercial tools. Its dependency graph is also inferred from trace data, which is cool but can sometimes be a bit… imaginative. It’s showing you what did happen, not what is supposed to happen, which is both a feature and a source of confusion.

So, in summary: use the all-in-one for local work, configure your OTLP exporter for gRPC, and for the love of all that is holy, use a Collector in production. Jaeger is the workhorse that makes the entire distributed tracing practice viable. It’s not always the shiniest, but it’s robust, scalable, and it gets the job done without a lot of fuss.