Right, so you’ve instrumented your code. Congratulations, you’re now emitting beautiful, pristine telemetry data. Which is a bit like carefully crafting a message in a bottle and throwing it into the ocean. The OpenTelemetry Collector is the fleet of ships and satellites you deploy to actually find those bottles, read them, and radio the contents back to headquarters. It’s the unsung hero, the plumbing, the data bus. You don’t strictly need it, but life without it is a messy, manual, and frankly amateurish affair.

Think of the Collector as a highly configurable, vendor-agnostic data pipeline. Its entire job is to receive telemetry data (traces, metrics, logs), potentially process it (filter, modify, enrich), and then export it to one or more backends (like Jaeger, Prometheus, a logging system, or commercial vendors). This is a stroke of genius because it decouples your application from your observability backend. Want to switch from Vendor A to Vendor B? You change a config file on the Collector and restart it. Your application code? It doesn’t care. It just sends its data to the Collector and gets on with its life.

The Core Architecture: Receivers, Processors, Exporters

The Collector’s configuration is built on three fundamental concepts, and if you get these, you get everything.

  • Receivers: How data gets into the Collector. This can be via the OTLP (OpenTelemetry Protocol) gRPC or HTTP port, or it can pull data from other sources like Prometheus, Jaeger, or even scrape host metrics. Your application will be configured to send its OTLP data to the Collector’s receiver endpoint.
  • Processors: What happens to the data inside the Collector. This is where the magic (and the gotchas) live. You can do things like batch data for efficient writes, filter out noisy spans, modify attributes (e.g., add a environment=production tag), or sample traces. Crucial warning: Processors that modify data (like the attributes processor) must be placed in the processors section of your pipeline, but non-mutating ones like batch must be placed in the exporters section. This is a classic footgun in the YAML config.
  • Exporters: How data gets out of the Collector to its final destination. This is where you define your connections to Jaeger, Prometheus, logging systems, or any other backend that makes your data useful.

A Real, Runnable Collector Config

Let’s look at a practical otel-collector-config.yaml file. This one receives OTLP data, batches it, adds a useful attribute, filters out a health check endpoint, and sends it to both a Jaeger instance for debugging and a metrics-compatible backend.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  # Processors that can mutate data go here
  attributes/example:
    actions:
      - key: "deployed_with"
        value: "ansible"
        action: upsert # Add it if it doesn't exist, update if it does
  filter/spans:
    spans:
      exclude:
        match_type: strict
        services: "my-web-app"
        span_names: "/health" # Let's be real, no one needs traces for health checks

exporters:
  # Non-mutating processors are actually defined under exporters. Yes, it's confusing.
  batch:
    send_batch_size: 1000
    timeout: 10s
  jaeger:
    endpoint: "jaeger:14250"
    tls:
      insecure: true
  otlphttp/for-backend:
    endpoint: "https://api.myobservabilityplatform.com"
    headers:
      authorization: "Bearer ${env:MY_BACKEND_API_KEY}"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [attributes/example, filter/spans] # Mutating processors
      exporters: [batch, jaeger, otlphttp/for-backend] # Batch is a "processor" but lives here!
    metrics:
      receivers: [otlp]
      exporters: [batch, otlphttp/for-backend]
    logs:
      receivers: [otlp]
      exporters: [batch, otlphttp/for-backend]

To run this with the Docker image, you’d mount this config: docker run -v $(pwd)/otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml -p 4317:4317 otel/opentelemetry-collector-contrib:latest.

Deployment Patterns and Pitfalls

You’ll typically deploy the Collector in one of two ways:

  1. As an Agent (Sidecar or DaemonSet): This runs alongside your application, either as a sidecar container in the same pod (Kubernetes) or as a daemon on every host. The application sends data to localhost:4317. This is fantastic for reliability; the agent handles batching, retries, and if the network goes down, it can spool data to disk. The downside is you now have to manage more moving parts.
  2. As a Gateway (Standalone Service): This runs as a central cluster or service. All your applications, across all hosts, send their data to this central endpoint. This is simpler to manage but becomes a single point of failure and a potential bottleneck. For any serious setup, use the Agent pattern. The gateway should be for aggregating from agents, not from raw applications.

The biggest pitfall? Resource usage. The Contrib Collector is a glorious Swiss Army knife with hundreds of components. It’s also a memory hog if you’re not careful. Start with a basic config and monitor its memory consumption. For high-throughput environments, you might need to tune the batch processor size or use the memory_limiter processor to avoid getting OOMKilled. It’s ironic that your observability tool can crash from being… unobservable. But that’s our world.