26.4 ServiceMonitor and PodMonitor: Prometheus Operator CRDs

Right, so you’ve got Prometheus installed via its Operator. Good for you. That was the easy part. Now comes the actual magic trick: telling the thing what to scrape. You could go back to the dark ages of manually editing a prometheus.yml file, but you installed the Operator for a reason. It’s time to use its superpowers: ServiceMonitor and PodMonitor. Think of these as your translators, converting your application’s cry for attention (“Here are my metrics!”) into a language the Prometheus server actually understands.

The core problem they solve is decoupling. Without them, every time you add a new service, someone (or some CI job) has to ssh into the Prometheus server and update its configmap. It’s a fragile, manual, and utterly unscalable process. The Operator introduces these Custom Resource Definitions (CRDs) so that your application’s deployment can declare, “Hey, I exist, and here’s how to monitor me.” Prometheus, through the Operator, is constantly watching for these declarations. It finds them, compiles a new configuration on the fly, and reloads itself. It’s beautiful automation.

ServiceMonitor: The Standard Approach

A ServiceMonitor is the most common way to go. It does exactly what it says on the tin: it tells Prometheus to scrape metrics by looking at a Kubernetes Service. This is perfect because your services are already labeling their pods—that’s how the service finds them! We’re just piggybacking on that same machinery.

Here’s a typical example. Let’s say you have a web API service called my-sweet-api. Its pods expose metrics on port web (port 8080). First, you’d have a Service that looks something like this:

apiVersion: v1
kind: Service
metadata:
  name: my-sweet-api
  namespace: production
  labels:
    app: my-sweet-api # This is the key label for our selector later
spec:
  ports:
  - name: web
    port: 8080
    targetPort: 8080
  selector:
    app: my-sweet-api # This selects the pods

Now, to instruct Prometheus to scrape every pod behind this service, you create a ServiceMonitor in the same namespace.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-sweet-api-monitor
  namespace: production # Crucial: This needs to be in the same namespace as the service!
spec:
  selector:
    matchLabels:
      app: my-sweet-api # This finds the Service above
  endpoints:
  - port: web # Must match the name in the Service definition
    path: /metrics # The default, but you can specify any path here
    interval: 30s # How often to scrape. Be careful setting this too low.

The Operator sees this, finds the Service, finds all the Pods that Service points to, and adds them to Prometheus’s target list. The genius is in the selector. It doesn’t point at Pods; it points at the Service, which then points at the Pods.

PodMonitor: When Services Are Overkill

Sometimes, a Service is overkill. Maybe you have a one-off job, a pod that doesn’t need a stable network identity, or a sidecar you want to scrape directly. This is where PodMonitor shines. It cuts out the middleman and selects Pods directly.

Let’s say you have a batch job pod that runs, exposes metrics for its duration, and then quits. You don’t have a Service for it, and you shouldn’t. A PodMonitor is your answer.

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: my-batch-job-monitor
  namespace: production
spec:
  selector:
    matchLabels:
      job-type: metrics-generator # This selects Pods directly!
  podMetricsEndpoints:
  - port: metrics-port # This must match the name defined in the Pod's container ports
    path: /metrics
    honorLabels: true # This is a best practice we'll get to in a second.

The key differences? It’s podMetricsEndpoints instead of endpoints, and the selector is applied directly to Pods, not Services.

The Crucial Nuance: honorLabels

This is the single biggest “gotcha” and the thing most people mess up. When Prometheus scrapes a target, the metrics it gets can have labels (like job="my-sweet-api"). Prometheus also attaches its own labels to the scraped metrics (like job="production/my-sweet-api-monitor/0"). What happens if there’s a conflict?

By default (honorLabels: false), Prometheus will overwrite the scraped metric’s job label with its own. This is almost never what you want. You want the label your application exported to be the truth.

Setting honorLabels: true tells Prometheus, “If there’s a label conflict, trust the target, not me.” Always set this to true. It prevents your metrics from being accidentally relabeled into oblivion and makes your life debugging much, much easier.

The Permissions Trap: serviceMonitorSelector

Here’s the kicker. Just because you create a ServiceMonitor doesn’t mean your Prometheus instance is watching for it. The Prometheus Operator creates a Prometheus resource (e.g., kubectl get prometheus). That resource has a field called serviceMonitorSelector. It’s a label selector that determines which ServiceMonitors it will acknowledge.

If your Prometheus resource has serviceMonitorSelector: {} (selects all), you’re golden. But if it’s set to something like release: prometheus, then your ServiceMonitor must have that label, or it will be silently ignored. You’ll be staring at your perfect YAML, wondering why nothing is happening. Always, always check what your Prometheus server is configured to watch. The same logic applies to podMonitorSelector.