22.3 EXPLAIN (ANALYZE, BUFFERS): Cache Hit vs Disk Read
Alright, let’s get our hands dirty. You’ve run EXPLAIN and you’ve seen the plan, but it’s a bit like a friend describing a movie plot without the special effects—you get the gist, but none of the spectacle. EXPLAIN ANALYZE is that movie with all the explosions. And when you add BUFFERS, you’re getting a behind-the-scenes documentary on how they pulled off those explosions, specifically how much stuff they had to haul in from the studio lot (your disk) versus what was already on the soundstage (your memory).
The command is your new best friend:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM your_table WHERE some_condition;
Run this, and you’ll get your normal query plan, but now adorned with two crucial pieces of information: actual execution time and, more importantly, a Buffers: section for each node. This tells you about your interaction with the shared buffers cache—PostgreSQL’s ingeniously named area of memory for storing frequently accessed data pages.
What’s in the Buffers Output?
Look for a line that reads something like:
Buffers: shared hit=234 read=45 dirtied=7 written=3
Let’s decode this, because it’s the Rosetta Stone for your query’s I/O:
shared hit=234: This is the good stuff. These 234 data blocks (typically 8KB each) were already sitting pretty in the shared buffers cache in RAM. Fetching them was lightning-fast. A high number of hits means your working set fits nicely in memory and PostgreSQL isn’t wasting time waiting on your disk.shared read=45: This is the… less good stuff. These 45 blocks weren’t in memory, so PostgreSQL had to go knocking on the disk’s door to fetch them. Disk reads are orders of magnitude slower than memory accesses. If you see a highreadcount, especially relative tohit, it’s a giant, blinking sign that either your dataset is much larger than your available cache, or you’re missing an index that would prevent a full table scan.shared dirtied=7: This is the number of blocks that your query modified in the cache. It’s a “you changed this, it’s now different from what’s on disk” flag. This is normal forINSERT,UPDATE, andDELETEoperations.shared written=3: This indicates how many of those “dirtied” blocks were then written back to disk by the query itself. Wait, what? Don’t the background writer processes usually handle that? Yes, almost always. You’ll typically only see awrittenvalue on operations that useCREATE TABLE ASor certain utility commands that force a sync. For mostSELECTqueries, this will be 0.
Why This is a Bigger Deal Than You Think
Knowing the hit/read ratio isn’t just about spotting slow queries; it’s about understanding your system’s behavior. A query that shows read=1000 on its first run might show hit=1000 on its second run if the data is now cached. This is why your query might be mysteriously faster the second time you run it. The BUFFERS option exposes this caching effect perfectly.
It also reveals the true cost of an index. Let’s say you add an index and your query time plummets. EXPLAIN ANALYZE shows a faster plan. Great. But EXPLAIN (ANALYZE, BUFFERS) shows you why: the planner is now doing an Index Scan that only needs to read 5 blocks from disk instead of a Seq Scan that had to read 10,000. The victory isn’t just the algorithm change; it’s the massive reduction in physical I/O.
A Real-World Example: The Indexless Wonder
Let’s imagine a users table with a few million rows. You want to find everyone with a specific email domain.
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users WHERE email LIKE '%@example.com';
Seq Scan on users (cost=0.00..18334.00 rows=10000 width=123) (actual time=0.045..125.780 rows=10235 loops=1)
Filter: (email ~~ '%@example.com'::text)
Rows Removed by Filter: 998765
Buffers: shared hit=205 read=13340
Planning Time: 0.115 ms
Execution Time: 126.441 ms
Ouch. Look at that Buffers line: read=13340. The database had to pull over 13 thousand blocks (roughly 100 MB) from disk to do this sequential scan. The hit=205 is just a pittance of catalog lookups. This is a classic sign that you’re missing an index. The Filter line rubbing your nose in the fact it had to sift through a million rows to find ten thousand is just salt in the wound.
Now, let’s see what happens if we add a functional index (since leading % kills normal index use):
CREATE INDEX idx_users_email_domain ON users (right(email, strpos(reverse(email), '@') - 1));
-- Yeah, it's a bit ugly. Sometimes the truth is.
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users WHERE email LIKE '%@example.com';
Index Scan using idx_users_email_domain on users (cost=0.29..402.35 rows=10000 width=123) (actual time=0.055..8.223 rows=10235 loops=1)
Index Cond: ((right(email, strpos(reverse(email), '@'::text) - 1)) = 'moc.elpmaxe'::text)
Buffers: shared hit=105 read=5
Planning Time: 0.301 ms
Execution Time: 8.981 ms
Behold. The Execution Time dropped from 126ms to 9ms. But the real story is in the Buffers: read=5. The query went from needing 13,340 disk reads to just 5. The rest (hit=105) came from cache. This is a monumental reduction in I/O pressure. This is the difference between a query that grinds your disk to a halt and one that barely makes it break a sweat.
The Gotchas and Best Practices
- Run It Twice (Seriously): The first run warms the cache. The second run shows you the steady-state performance. If you’re testing a query’s performance, always note the first-run
readvalue, but optimize for the second-runhitvalue. The first run tells you the cold-cache cost; the second tells you the warm-cache cost. - It’s a Lie (Sort Of): The
Buffersinformation only accounts for the shared buffers cache. It doesn’t account for the operating system’s own filesystem cache. If a block is in the OS cache but not the shared buffers, PostgreSQL will still count it as aread, even though the OS served it from RAM. This means your actual performance might be slightly better than theBuffersoutput suggests, but the metric is still perfectly valid for comparing two query plans on the same system. - The Dirty Secret of DML: On
INSERT,UPDATE, andDELETEoperations,BUFFERSwill show you the carnage. You’ll see highdirtiedcounts and potentiallywrittenif the operation was large enough to trigger a sync. This is invaluable for understanding the write-amplification of your queries.
In short, (ANALYZE, BUFFERS) is the tool that moves you from guessing why a query is slow to knowing exactly where it’s spending its time. It transforms query planning from an abstract art into a concrete science. Use it relentlessly.