Right, let’s get our hands dirty. You’ve probably heard the term “table bloat” thrown around. It sounds gross, and it is. It’s when your table on disk is far larger than the data it actually contains, choked with “dead” rows from updates and deletes. It wastes storage, murders performance by forcing the database to read vast empty stretches of disk (a “seq scan” becomes a scenic tour of a ghost town), and generally makes a mockery of your carefully planned infrastructure.

PostgreSQL’s VACUUM is supposed to clean this up, but it’s not magic. Sometimes it can’t keep up. Sometimes it’s configured… poorly. You can’t just guess if you have a problem; you need to measure it. And for that, my friend, you need to peek under the hood. Forget generic statistics—we’re going straight to the source with two extension-powered scalpels: pgstattuple and pg_freespacemap.

First things first, these aren’t enabled by default. You have to tell PostgreSQL, “Yes, I am an adult who wants powerful tools.” Add them to shared_preload_libraries in your postgresql.conf and restart, or just create them in your database if you’re on a modern version (>= 13) and using CREATE EXTENSION.

CREATE EXTENSION IF NOT EXISTS pgstattuple;
CREATE EXTENSION IF NOT EXISTS pg_freespacemap;

How pgstattuple Gives You the Gruesome Details

pgstattuple is the autopsy. It doesn’t estimate; it physically scans every page of the table and gives you a brutally honest report. It’s I/O intensive, so don’t run it on your 2TB fact table during peak hour. Save it for a maintenance window or for those tables you’re already suspicious of.

Run it like this:

SELECT * FROM pgstattuple('your_table_name');

Here’s what you’ll get, and why each metric matters:

  • table_len: The total, grim size of the table. This is your starting point for how bad things are.
  • tuple_count: The number of live, breathing rows. The good stuff.
  • tuple_len: The size of all those live rows. (tuple_len / table_len) gives you a solid efficiency ratio.
  • dead_tuple_count: The number of deceased rows that VACUUM hasn’t yet recycled. A high number here means VACUUM is losing the fight.
  • dead_tuple_len: The sheer mass of the dead. This is your bloat.
  • free_space: This is the space within pages that’s been marked as reusable. This is the small-scale free space.

The killer metric is the free space percentage. You can calculate it:

SELECT 
  table_len, 
  free_space,
  (free_space / table_len::float) * 100 AS free_percent
FROM pgstattuple('your_table_name');

If free_percent is sky-high, you’ve got a serious case of the bloat. A number over 20-30% is a clear sign you need to intervene, likely with a VACUUM FULL or a table rewrite.

How the Free Space Map Tells You Where the Gaps Are

While pgstattuple tells you how much free space exists, pg_freespacemap tells you where it is. The Free Space Map (FSM) is a specialized structure inside PostgreSQL that tracks which pages have enough free space to hold a new row. It’s what VACUUM updates and what INSERT statements consult.

Querying it shows you the free space on a per-page basis:

SELECT * FROM pg_freespace('your_table_name');

This returns a list of pages (blkno) and the amount of free space in each (avail). This is incredibly useful for diagnosing specific types of performance issues. If you see a huge number of pages with, say, just a few bytes of free space, it means your table is fragmented. New inserts can’t find a home and have to create new pages, leading to inefficient, spread-out data. This is often caused by a fillfactor that’s too high for a workload with lots of updates.

The Crucial Difference and When to Use Which

Here’s the designer’s choice you need to understand: pg_freespacemap only reports space that is tracked and reusable. It doesn’t see the space taken by dead tuples that haven’t been vacuumed yet. pgstattuple sees everything.

This is why you often need both:

  1. pgstattuple tells you the total bloat problem (dead_tuple_len).
  2. pg_freespacemap shows you the effectiveness of the recycling system after a VACUUM has run.

The best practice? Use pgstattuple for a one-off health check to confirm bloat. It’s the definitive diagnosis. Use pg_freespacemap for more frequent, lighter-weight checks to monitor the state of the FSM and ensure your VACUUM strategies are working after you’ve dealt with the initial bloat. Never, ever run either of these on a massive table during production traffic unless you enjoy explaining downtime to your boss. This is surgical equipment, not a sledgehammer. Use it wisely.