39.1 MVCC and Dead Tuples: Why VACUUM Is Necessary
Alright, let’s pull back the curtain on the single most misunderstood feature of Postgres: VACUUM. You’ve probably heard you need to run it, but you might not know why. It all comes down to how Postgres handles concurrency, and it’s a brilliant, if slightly messy, solution.
Postgres keeps your data safe by giving every transaction a consistent view of the database. It does this with Multi-Version Concurrency Control (MVCC). Here’s the gist: instead of overwriting a row when you update it, Postgres creates a new version of that row and leaves the old one in place. The old row is invisible to new transactions, but any transaction that was already running can still see it, maintaining its consistent view of the world. This means no readers are ever blocked by writers, and vice-versa. It’s a fantastic feature that makes Postgres robust and performant under load.
But, and this is a big but, it has a side effect. Those old, invisible rows? They’re called dead tuples. They’re just sitting there, taking up space, like digital ghosts haunting your tables. An UPDATE essentially becomes a DELETE of the old tuple and an INSERT of the new one. A DELETE doesn’t actually remove the row; it just marks it as dead.
The Anatomy of a Dead Tuple
Think of a table as a bookshelf. When you UPDATE a row, you don’t erase the old book and write in the new one. You just add a new, revised edition of the book to the shelf. The old edition stays there, but it has a big sticker on it that says “OBSOLETE: FOR HISTORICAL REFERENCE ONLY.” Your table’s disk file gets longer with every update and delete. This is what we call bloat.
If you never clean up, your database files grow and grow, performance tanks because queries have to scan through gigabytes of dead data to find the live ones, and you might just run out of disk space. This is where VACUUM comes in. It’s the janitor that comes through after the party and cleans up all the empty cups and dead tuples.
How VACUUM Actually Works (It’s Not a Delete)
This is the critical part everyone gets wrong. VACUUM does not typically return space to the operating system (unless you use VACUUM FULL, but we’ll get to that horror show later). Its primary job is to:
- Mark space for reuse: It finds the dead tuples and marks their space as available for future inserts and updates within the same table. It’s like putting the old, obsolete books in a “free-to-use” bin. The next time you add a new book (row), Postgres will first check this bin for a spare spot before adding to the end of the shelf.
- Update the visibility map: It updates internal data structures to help future
VACUUMruns and index-only scans work more efficiently. - Freeze transaction IDs: It prevents transaction ID wraparound, which is a catastrophic failure mode we’ll discuss in a moment. This is non-negotiable.
You can see this bloat in action. Let’s create a quick demo:
-- Create a simple table
CREATE TABLE vacuum_demo (id serial primary key, data text);
-- Insert a row
INSERT INTO vacuum_demo (data) VALUES ('Initial Data');
-- Check the current physical size (on disk)
SELECT pg_size_pretty(pg_relation_size('vacuum_demo')); -- Probably around 8-16 kB
-- Now, let's update it repeatedly to create bloat
DO $$
BEGIN
FOR i IN 1..1000 LOOP
UPDATE vacuum_demo SET data = 'Updated Data ' || i;
END LOOP;
END;
$$;
-- Check the size again... it's much larger now!
SELECT pg_size_pretty(pg_relation_size('vacuum_demo')); -- Maybe 100+ kB
-- Now, run VACUUM
VACUUM vacuum_demo;
-- Check the size again. Notice it probably didn't shrink!
-- The space was marked for reuse, not returned to the OS.
SELECT pg_size_pretty(pg_relation_size('vacuum_demo'));
The size after the VACUUM likely stayed the same. That’s expected! The space is now free for the table itself to use. Insert a new row and it might go into that freed space instead of making the file bigger.
The Autovacuum Daemon: Your Silent Guardian
You do not, and should not, have to run VACUUM manually on most tables. Postgres has a brilliant background process called autovacuum that does this for you automatically. It wakes up, checks tables for a high number of dead tuples, and does a quick cleanup. It’s one of the best features of Postgres. Your job isn’t to run VACUUM, it’s to configure autovacuum properly so it can do its job effectively.
The Specter of Transaction ID Wraparound
This is the one thing that will force even the most poorly configured Postgres database to VACUUM. Remember how MVCC needs to know which rows are visible to which transactions? It does this using a 32-bit transaction ID (XID), which means it can only ever have about 2 billion transactions before it theoretically “wraps around” from 4,294,967,295 back to 3.
If this happens, data that was very old suddenly looks very new, and your database will shut itself down to prevent data corruption. It’s a safety feature. To prevent this, VACUUM marks old rows as “frozen,” meaning they are eternally visible to all transactions. This is the one VACUUM operation that is non-optional and will happen even if autovacuum is turned off, because the alternative is the complete and utter failure of your database. Don’t mess with this.
When to Worry and When to Relax
If you see your table sizes growing indefinitely even with autovacuum running, don’t panic immediately. Remember, it’s reusing space. Worry if your queries are getting slower while your table size grows, as that’s a sign of bloat. Use the pg_stat_all_tables view to monitor the number of dead tuples.
SELECT schemaname, relname, n_live_tup, n_dead_tup,
(n_dead_tup::float / (n_live_tup + n_dead_tup) * 100) AS percent_dead
FROM pg_stat_all_tables
WHERE n_live_tup > 0
ORDER BY n_dead_tup DESC;
A high percentage of dead tuples is autovacuum’s cue to get to work. If it’s not, you’ve likely found a configuration problem, not a VACUUM problem.