Monitoring

9.5 Running DaemonSets Only on a Subset of Nodes

Right, so you’ve got a DaemonSet. It’s happily deploying its pod on every single node in your cluster. That’s its job. But what if you don’t want it on every node? What if your brilliant log-collector pod needs a specific filesystem mount that only exists on your workhorse compute nodes, or your gpu-model-inferencer has absolutely no business running on the cheap little spot instances handling your web traffic? This is where we stop the DaemonSet’s tyrannical reign of “one for all” and introduce some democracy. We use the bouncers of the Kubernetes club: nodeSelectors, Taints and Tolerations, and if we’re feeling fancy, nodeAffinity. Let’s break it down.

9.4 DaemonSet Update Strategy

Right, so you’ve got your DaemonSet deployed. It’s happily running its little pod on every node, doing whatever thankless infrastructure task you assigned it. But now you need to change its spec. Maybe you’re updating the container image to patch a vulnerability, or perhaps you’re adding a new volume mount. This is where the updateStrategy rears its head, and you need to understand it because, trust me, the default behavior will bite you when you least expect it.

9.3 Tolerations to Schedule on Tainted Nodes

Right, so you’ve got your DaemonSet humming along, deploying its pod to every node in your cluster. It’s a beautiful thing. But then you run into the real world, and the real world has problems. Some of your nodes are, shall we say, special. Maybe they’re GPU-equipped beasts that cost more than your car, reserved for machine learning workloads. Maybe they’re edge nodes with spotty connections, or they’re just old and cranky and you don’t trust anything but a specific monitoring agent to run on them.

9.2 DaemonSet Scheduling and Node Selectors

Right, so you’ve got your DaemonSet humming along, deploying its little pod on every node. That’s great, until you realize you don’t actually want it on every node. Maybe you’ve got a special node reserved for massive batch jobs and your logging sidecar would just get in the way. Or perhaps you only want your fancy GPU monitoring agent on the nodes that actually have, you know, GPUs. This is where we stop the blunt-force “deploy everywhere” approach and start getting surgical. The two primary tools for this are nodeSelector and nodeAffinity. One is a simple, no-nonsense hammer; the other is a finely-tuned scalpel. You need to know how to wield both.

9.1 DaemonSet Use Cases: Log Collectors, Monitoring Agents, Network Plugins

Alright, let’s talk about why you’d actually use a DaemonSet. You don’t just deploy them for fun; they solve a very specific, infrastructure-level problem: when you need a piece of software running on every single node in your cluster, come hell or high water. It’s the Kubernetes way of saying, “I don’t care what’s scheduled here, this pod is non-negotiable.” Think of them as the mandatory background services of your operating system, but for your cluster.

9. DaemonSets: One Pod Per Node

40.7 Prometheus and Grafana: Metrics with postgres_exporter

Alright, let’s get your database metrics out of the terminal and onto a dashboard where they belong. Staring at pg_stat_statements output is a rite of passage, but it’s a terrible way to run a system. We’re going to hook your Postgres instance up to Prometheus and Grafana using the brilliant postgres_exporter. Think of it as a tiny, hyper-competent translator that lives next to your database, constantly asking it “how’s it going?” in a language Prometheus understands, and then Prometheus makes pretty pictures for Grafana.

40.6 pg_activity: A top-Like Monitor for PostgreSQL

Alright, let’s talk about pg_activity. You’ve probably used top or htop on a Linux system to see what’s chewing up all your CPU and memory. pg_activity is that, but for your PostgreSQL database. It’s a real-time, terminal-based dashboard that lets you see exactly what your server is doing, who’s connected, and which queries are currently trying to set your disk on fire. It’s not some bloated GUI; it’s a fast, information-dense tool you can SSH into and get an immediate pulse check. Forget digging through a dozen pg_stat views manually when things are on fire—this is your first line of defense.

40.5 pgBadger: Parsing and Visualizing the Log

Alright, let’s get our hands dirty with pgBadger. You’ve enabled your log_line_prefix, set log_min_duration_statement, and now your log files are growing, filled with the glorious, unvarnished truth of what your database is actually doing. Staring at these raw text files is a special kind of masochism. It’s like trying to understand a city’s traffic patterns by lying in the middle of a highway with a notepad. Enter pgBadger: it’s the traffic control helicopter that lifts you up, gives you a map, and points out all the fender benders and gridlock.

40.4 log_min_duration_statement: Capturing Slow Queries

Right, let’s talk about finding the slowpokes in your database. You can’t fix what you can’t see, and log_min_duration_statement is your primary spotlight for illuminating the queries that are gumming up the works. It’s a brilliantly simple, brutally effective PostgreSQL setting. Here’s the deal: you tell PostgreSQL, “Hey, if any query takes longer than this many milliseconds to run, I want you to write the entire thing, plus some crucial details, to the log file.” You’re not sampling; you’re getting a perfect record of every single query that crosses your pain threshold. This is your first and most important line of defense against sluggish performance.

40.3 pg_locks: Viewing Lock Contention

Alright, let’s get our hands dirty with pg_locks. This is the system view that pulls back the curtain on the database’s backstage bouncers—the locks that are either granting or denying access to your data. Think of it as the VIP list for your rows and tables, and we’re here to see who’s on it, who’s waiting, and who’s causing a scene. The first thing you need to internalize is that locking is not inherently evil. It’s the mechanism that maintains data integrity, ensuring two clowns (I mean, transactions) don’t try to change the same row at the same time. The problem isn’t locking; it’s contention—when transactions start queueing up behind each other, grinding your application’s throughput to a halt. pg_locks is your primary tool for seeing that contention in real-time.

40.2 pg_stat_user_tables and pg_stat_user_indexes: Usage Statistics

Alright, let’s get our hands dirty with the two most straightforward, no-nonsense views in the PostgreSQL statistics arsenal: pg_stat_user_tables and pg_stat_user_indexes. These are your first port of call when you’re trying to figure out what your database is actually doing, as opposed to what you think it’s doing. They’re gloriously simple counters that track every access to your tables and indexes. No magic, no voodoo—just cold, hard numbers. Think of them as the surveillance cameras on your data. They don’t judge, they just record. And like any good surveillance footage, the story is in how you interpret the sequence of events.

40.1 pg_stat_activity: What Is Running Right Now

Alright, let’s pull back the curtain. You want to know what’s happening right now inside your Postgres database. Not what happened five minutes ago, not some aggregated average—the raw, unvarnished truth of the current moment. For that, you go to pg_stat_activity. Think of it as the database’s live activity monitor, the one place you can see every single connection and what it’s up to, whether it’s heroically saving your data or just sitting there idly, burning a hole in your connection limit.