38.1 vmstat: Virtual Memory, Swap, CPU, and Block I/O Statistics
Alright, let’s talk about vmstat. It’s one of those old-school Unix tools that has stubbornly refused to die, and for good reason: it gives you a shockingly comprehensive, low-overhead snapshot of what your system’s core components are doing at any given moment. The name stands for “virtual memory stat,” but that’s a bit of a misdirection. It’s like calling a Swiss Army knife a “blade holder.” Sure, virtual memory stats are in there, but you’re also getting CPU, swap, and block I/O—all in one dense, text-based punch.
The first thing you need to know is that vmstat has two modes: a single snapshot and a periodic reporter. The single snapshot is great for a “right now” reading, but the real magic happens when you run it in repeat mode. You’ll see the system’s behavior over time, which is infinitely more valuable than a static point-in-time number.
Here’s the basic incantation. The two numbers at the end are the delay (in seconds) and the count (number of times to report). To run it every 2 seconds until you hit Ctrl+C, you’d just omit the count.
# Get a single report
vmstat
# Get a report every 2 seconds, 5 times
vmstat 2 5
Now, let’s decipher the output, because at first glance it looks like a random number generator had a seizure. The key is in the columns, and they’re grouped into areas: procs, memory, swap, io, system, and cpu.
The Procs Column: What’s in the Queue?
r and b. This is your first stop for understanding system load. r is not “processes running”—the kernel scheduler is way more complicated than that. It shows the number of processes that are runnable and waiting for a CPU core to become available. If this number is consistently higher than your number of CPU cores, you have a CPU bottleneck. It’s a line. b is simpler: it’s the number of processes that are uninterruptible sleep, usually waiting for I/O (like a slow disk or network call). A non-zero b here and there is normal; a consistently high number means your I/O subsystem is crying for help.
Memory and Swap: The Usual Suspects
free is the kilobytes of idle memory. Not “free” as in unused, but memory the kernel can instantly reassign. A low free number on a modern Linux system is often perfectly fine; the kernel is aggressively using RAM for disk caching (that’s the cache number you’d see in free -m). The real canary in the coal mine is si (swap in) and so (swap out). If you see any activity in these columns, your system is actively swapping pages between RAM and disk. This is a performance murder scene. It’s incredibly slow. Find what’s causing the memory pressure and stop it.
The I/O Section: Your Disks Are Talking
bi and bo are block I/O. This measures blocks read in from and written out to block devices (like your hard drives or SSDs) per second. This is a great way to see if your application is pounding the disk. High bo often means a lot of writes are happening (logs, database transactions, etc.), while high bi could indicate something is reading heavily from disk. Remember, these are in blocks (usually 1024 bytes), not bytes. The designers were clearly feeling frugal when they named these. Why not “disk_in” and “disk_out”? Who knows. It’s a mystery.
CPU: Where the Cycles Go
This is the easiest part to read but often the hardest to interpret. It’s a percentage breakdown of CPU time.
us: time spent running non-kernel code (your applications).sy: time spent running kernel code (system calls).id: time spent twiddling its thumbs, doing nothing.wa: I/O wait time. This is the percentage of time the CPU was idle because it was waiting for an I/O operation to complete. A consistently highwapaired with a highbin the procs section is a giant, flashing sign pointing to your I/O bottleneck.st: “steal time.” This only matters if you’re in a virtual machine. It’s the percentage of time your VM was ready to run, but the hypervisor gave the CPU to another VM instead. If this is high, your hosting provider is oversubscribing the physical server.
The biggest pitfall with vmstat is treating a single output line as gospel. System performance is a river, not a lake. You need to watch the flow. Run it with a delay (vmstat 2) and watch for patterns. Is wa spiking every 5 seconds? Is r climbing steadily? That’s the insight you’re looking for. It’s not a deep profiler, but it’s the best first tool you can reach for when the whole system feels “slow.” It tells you exactly which neighborhood the problem is in, so you know which specialized tool to use next.