21.6 Index Bloat: Causes, Detection, and REINDEX

Right, let’s talk about index bloat. This is the digital equivalent of your closet being full of clothes you haven’t worn since 2015. Your table’s logical size (the actual data you care about) is fine, but the physical size on disk (the index files) is horrifically inflated. It wastes storage, but more importantly, it murders performance because the query planner has to wade through all that useless cruft. It’s your database’s way of screaming for an intervention.

The primary culprit here is the Multi-Version Concurrency Control (MVCC) model that PostgreSQL uses to be so brilliantly robust. When you update or delete a row, PostgreSQL doesn’t immediately remove the old version. It just marks it as obsolete and adds a new version. This is great for giving every transaction a consistent view of the data without read locks. But those dead, obsolete rows—lovingly called “dead tuples”—stick around in your indexes, taking up space and offering precisely zero value. A VACUUM operation will clean up the dead tuples from the table, but it only marks the space in the index as available for reuse; it doesn’t physically compact the index file. That’s where REINDEX comes in.

How to Spot a Bloated Index

You don’t need a microscope; you need the right queries. Don’t just guess. The pg_stat_all_indexes view is your first stop. A healthy index has a high idx_scan value relative to its size. An index that’s gigantic but never used is just expensive decoration.

For the real dirt, use the pgstattuple extension. It’s not installed by default because it’s a bit heavy, but it’s the gold standard for accuracy.

-- First, enable the extension if you haven't already
CREATE EXTENSION pgstattuple;

-- Now, run it on a suspect index. This will take a while on a large index.
SELECT * FROM pgstattuple('your_index_name');

Look at the free_space and dead_tuple_count (and its percent). If you’re seeing double-digit percentages for free space, your index is basically a suburban strip mall with half the stores vacant.

A quicker, more common method is using the built-in pg_stat_all_tables and some estimation functions. This query gives you a good ballpark for table bloat, which often correlates with index bloat:

SELECT
  schemaname,
  relname,
  n_live_tup,
  n_dead_tup,
  round((n_dead_tup::numeric / (n_live_tup + n_dead_tup) * 100), 2) AS dead_tup_percent
FROM pg_stat_all_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

If dead_tup_percent is consistently above 10-15%, you’ve got a vacuuming issue that’s almost certainly causing index bloat.

The Nuclear Option: REINDEX

When your indexes are bloated beyond reason, you call in the bulldozer: REINDEX. This command does exactly what it sounds like: it drops the existing index and builds a new, pristine, compact one from scratch. It’s the only way to truly reclaim that physical space.

-- Reindex a single index
REINDEX INDEX your_bloated_index;

-- Reindex all indexes in a table (handy if the whole table is a mess)
REINDEX TABLE your_problematic_table;

-- Feel like living dangerously? Reindex everything in the database.
-- For the love of all that is holy, don't run this on a live production system without a plan.
REINDEX DATABASE your_database;

Here’s the massive, glaring, “the designers made a questionable choice here” catch: REINDEX takes an exclusive lock on the parent table. For the entire duration of the rebuild, which could be hours on a large index, no one can write to—and depending on the type of index, sometimes even read from—that table. It’s a full table lock. In a production environment, this is often a complete non-starter. It’s like fixing a leak in the main water line by shutting off water to the entire city for eight hours.

The Production-Grade Solution: CONCURRENTLY

Thankfully, PostgreSQL offers a much more civilized way to do this: REINDEX CONCURRENTLY. This is the difference between that city-wide water shutdown and a skilled engineer putting in a bypass line before doing the work.

REINDEX INDEX CONCURRENTLY your_bloated_index;

It works by creating a new, temporary index alongside the old one. It slowly builds the new index while allowing all operations to continue on the table. Once the build is complete, it does a quick swap—making the new index live and atomically dropping the old one. The trade-off? It takes significantly longer and requires more temporary disk space (since you have two copies of the index for a while). But the massive win is that it doesn’t require a long-term exclusive lock. Writes can happen throughout the process.

The one hiccup is that if you cancel a CONCURRENTLY operation or it fails, you’re left with an invalid index named pg_reindex_index_12345 or similar. You have to manually drop it. A small price to pay for avoiding an outage.

Best Practices and Pitfalls

Don’t Make a Habit of It: REINDEX is a corrective surgery, not a daily vitamin. If you’re constantly needing to reindex, your autovacuum settings are probably wrong, or your workload has an unsustainable rate of updates/deletes. Fix the root cause.
Monitor CONCURRENTLY Jobs: A concurrent reindex on a large, active table can take a very, very long time. Keep an eye on it. The longer it runs, the more chance it has of being interrupted by a transaction ID wraparound, which is a whole other level of panic we don’t need today.
The B-tree Bonus: REINDEX on a B-tree index isn’t just about space; it can also improve performance by creating a more balanced tree structure. It’s a full tune-up.
The Locking Trade-off: Remember the choice: REINDEX (fast, but locks everything) vs. REINDEX CONCURRENTLY (slow, but allows concurrent access). Your maintenance window—or lack thereof—will dictate which one you use. Plan accordingly. Never, ever run a plain REINDEX on a production table during business hours unless you’re looking for a new job.