28.3 Instrumenting Applications with OpenTelemetry SDKs

Right, let’s get our hands dirty. You’ve decided you want to know what your distributed system is actually doing, not just what you hope it’s doing. That’s what instrumentation is for. It’s the process of adding observability code—the stuff that generates telemetry—directly into your application. Think of it like adding a flight data recorder to your code. We’re not just logging when it crashes; we’re recording its every operation.

With OpenTelemetry, you do this using language-specific Software Development Kits (SDKs). The beauty here is that the API (the interfaces you code against) is separate from the SDK (the implementation that sends data somewhere). This means you can instrument your code today, decide where to send it tomorrow, and change your mind next week without touching a line of application code. It’s a genuinely good design choice, and I don’t say that lightly.

The Holy Trinity: Tracer, Meter, and Logger

The SDK gives you access to three core instruments, but we’re focusing on tracing here. You’ll need a Tracer to create spans. You get one from a TracerProvider. I know, it feels a bit factory-factory, but stick with me. You set up the provider once at your application’s startup, and then you fetch tracers from it wherever you need them. The provider holds the configuration—like which exporter to use, what sampling rate to apply, etc.

Here’s how you’d set it up in a Node.js service. Notice how we create the provider and then set it as the global provider. This is a common pattern.

// app-init.js
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { SimpleSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { ConsoleSpanExporter } = require('@opentelemetry/sdk-trace-base');

// Create and configure the tracer provider
const provider = new NodeTracerProvider();
provider.addSpanProcessor(new SimpleSpanProcessor(new ConsoleSpanExporter()));
provider.register();

// Now, in your business logic file...
const opentelemetry = require('@opentelemetry/api');
const tracer = opentelemetry.trace.getTracer('my-awesome-service');

async function handleUserRequest(userId) {
  // Start a span. This becomes the active context.
  return tracer.startActiveSpan('handleUserRequest', async (span) => {
    try {
      span.setAttribute('user.id', userId);
      // ... your actual logic here ...
      const user = await getUserFromDatabase(userId);
      span.addEvent('Fetched user from DB', { userFound: !!user });
      return user;
    } finally {
      // You MUST end the span. No ifs, ands, or buts.
      span.end();
    }
  });
}

The crucial part here is the finally { span.end(); }. If you don’t end the span, it will never be exported. It’s like recording a show but never hitting stop—you just get a useless, infinitely long file. This is the most common rookie mistake. Use startActiveSpan and the callback pattern to ensure it always gets ended, even if your function throws an error.

Manual vs. Automatic Instrumentation

The code above is manual instrumentation. You’re writing the calls to startActiveSpan yourself. It’s powerful because you can add attributes and events that are specific to your business logic (like that user.id). But writing that for every function is tedious.

That’s where automatic instrumentation comes in. You can install OpenTelemetry libraries that monkey-patch common frameworks (Express, Fastify, PostgreSQL drivers, etc.) to automatically create spans for incoming requests and outgoing queries. It’s magic. The kind of useful magic that saves you thousands of lines of boilerplate.

# For a Node.js app, you might install these:
npm install @opentelemetry/instrumentation-http @opentelemetry/instrumentation-express @opentelemetry/instrumentation-pg

You then register these instrumentations during your setup. Suddenly, every HTTP request and database call is traced without you writing a single startActiveSpan. The designers got this right: use automatic instrumentation to get the broad strokes of the picture for free, and then use manual instrumentation to add the detailed, meaningful highlights.

Context is Everything (And It’s a Pain)

This is the part everyone quietly struggles with. For traces to work, the context (specifically the trace context) must be propagated across process boundaries. When service A calls service B, it must send along a header saying, “Hey, I’m trace ID X and I’m currently in span Y. Please make any new spans you create a child of Y.”

OpenTelemetry uses the Context API to manage this under the hood, especially with startActiveSpan, which handles the context propagation for you within a single Node.js process. But across HTTP? You need a propagator.

// Add this to your setup
const { B3Propagator } = require('@opentelemetry/propagator-b3');
const { W3CTraceContextPropagator } = require('@opentelemetry/core');

// You can use multiple propagators for compatibility
provider.register({
  propagator: new W3CTraceContextPropagator()
});

The W3CTraceContextPropagator uses the standard traceparent header. If you’re in a mixed environment, you might need the B3 propagator for older Zipkin compatibility. It’s a bit of a headache, but it’s a necessary evil. The SDK can’t read your mind to know what headers your services use to chat with each other. You have to tell it.

The Golden Rule of Attributes

Be generous with attributes, but be smart. Every attribute you add is a potential dimension you can filter and group by later in your tracing backend. user.id is a fantastic attribute. user.object containing a 10kb JSON blob of the entire user profile is a fantastic way to blow up your telemetry costs and storage.

Also, think about cardinality. Using transaction.id as an attribute is fine—it’s unique per span. Using user.email might be fine for a small internal app. Using it for a service with ten million users is a recipe for disaster, as it would create an infinite number of unique value combinations, utterly destroying the performance of your tracing system. The SDK can’t stop you from doing this, so you have to exercise some judgment.

The bottom line: Instrumentation is work, but it’s valuable work. Start with automatic instrumentation to get immediate wins, then surgically add manual spans where the business logic is complex and you need real insight. Get the context propagation right, and be thoughtful about your attributes. Do that, and you’ll move from guessing why your system is slow to knowing exactly which function in which service is to blame. And that, my friend, is power.