26.1 Prometheus Architecture: Scrape, Store, Query, Alert

Right, let’s get this party started. Prometheus isn’t some magical black box that just “knows” about your services. It’s more like a meticulous, slightly obsessive librarian who only knows about the books you explicitly tell it to go and read the title of, at very specific times. Its entire worldview is built on a simple, brutal cycle: scrape, store, query, alert. Miss one beat of this rhythm, and the whole symphony falls apart.

The Scrape: How Prometheus Gets Its Data

Forget push. Prometheus is a pull-based system. It decides when to go out, hat in hand, and ask your services for their metrics. The services it talks to are called “targets,” and they must expose their metrics on an HTTP endpoint, typically /metrics, in the exact text-based format Prometheus expects.

This is both its greatest strength and its most common point of failure. The strength? You can’t drown Prometheus with a metrics firehose; it drinks at its own pace. The failure? If Prometheus can’t reach your target, or your target serves up malformed metrics, you get nothing. Nada. A gaping void in your dashboard where your beautiful data should be.

You configure this in a scrape_config section of your prometheus.yml file. Let’s say you have a Node Exporter running on a machine to get host-level metrics:

scrape_configs:
  - job_name: 'node-exporter'
    scrape_interval: 15s  # How often to bother the target
    static_configs:
      - targets: ['10.1.2.3:9100', '10.1.2.4:9100'] # The actual machines to pester

Why job_name? It’s a logical grouping. All targets from this config get the label job="node-exporter", which is your first and most important handle for querying later. The scrape_interval is a strong suggestion, not a hard guarantee, so don’t expect atomic-clock precision.

Pitfall #1: The silent scrape failure. Prometheus will happily chug along trying to scrape a target that’s been dead for weeks, filling your database with “up{instance=“10.1.2.3:9100”} 0” instead of useful data. Always monitor the up metric. If that’s 0, your scrape is failing.

The Store: Where the Time Series Live

Once Prometheus has scraped the data, it needs to put it somewhere. It doesn’t just shove it in a MySQL table and call it a day. It has its own highly efficient, custom-built time-series database (TSDB) on disk.

Think of it like this: every time it scrapes, it’s adding a new “page” (a block of data) for a specific “book” (a time series). A time series is uniquely identified by its metric name and all its labels. http_requests_total{method="POST", handler="/api/v1/login", status="200"} is a completely different series from http_requests_total{method="GET", handler="/static/css", status="200"}. This labels-as-dimensions model is the absolute superpower of Prometheus. It’s what lets you slice and dice your data in infinitely creative ways later.

It stores this data locally. Yes, locally. No fancy distributed storage by default. This is the designers’ “questionable choice” that is actually brilliant for its use case: it makes Prometheus stupidly reliable and simple to run. The trade-off is durability and long-term storage. If your Prometheus server’s disk melts, your historical data is gone. For long-term retention, you’d push it to a remote storage backend like Cortex or Thanos, but that’s a story for another chapter.

The Query: Asking the Right Questions

Storing data is useless unless you can get it back out. This is where PromQL, the Prometheus Query Language, comes in. This is where you go from “I have data” to “I have insight.”

PromQL is a functional language. You start with a time series selector and then pipe it through functions and operators. Want to know the total rate of HTTP 5xx errors per handler over the last 5 minutes? You don’t need a PhD, you need PromQL:

sum by(handler) (
  rate(
    http_requests_total{status=~"5.."}[5m]
  )
)

Let’s break down the why:

http_requests_total{status=~"5.."} selects the relevant series.
[5m] is a range vector selector. The rate() function needs a time window to work its magic. It looks at the raw counter values over the last 5 minutes and calculates the per-second average rate of increase. This is crucial because your counters are always increasing; you almost always care about their rate of change.
sum by(handler) then aggregates those rates, adding them up and grouping the results by the handler label.

Pitfall #2: Forgetting rate(). Querying a raw counter directly will give you a meaningless, ever-increasing line. You almost always want rate() or irate().

The Alert: Waking Someone Up at 3 AM

The final part of the loop is taking that query power and using it to trigger alerts. You define alerting rules in a separate file that Prometheus loads. These rules are essentially PromQL expressions that evaluate to a number. If that number is non-zero for long enough, the alert becomes “firing” and gets sent to the Alertmanager.

Here’s a classic: alert on high instance memory usage.

groups:
- name: node-alerts
  rules:
  - alert: HighMemoryUsage
    expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 < 10
    for: 5m # This must be true for 5 minutes straight
    labels:
      severity: page
    annotations:
      summary: "Node is running out of memory ({{ $value }}% available)"
      description: "Instance {{ $labels.instance }} has less than 10% memory available for 5 minutes."

The for clause is your best friend. It’s the “don’t panic yet” buffer. It prevents you from getting spammed by alerts that flicker for a few seconds. The Alertmanager then takes these firing alerts, de-duplicates them, silences them, and routes them to the correct channel (Slack, PagerDuty, email).

Pitfall #3: Alerting on up == 0 without a for: 1m or similar. Otherwise, every single rolling deployment or brief network glitch will page you. Be kind to your on-call engineer. They deserve sleep.