38.6 autovacuum: Tuning Cost Delay, Scale Factor, and Thresholds

Right, let’s talk about autovacuum. It’s the janitorial service for your database, and if you ignore its configuration, you’re going to end up living in a very messy, very slow house. It’s not a question of if, but when. The default settings are, frankly, paranoid. They’re designed to not break the flimsiest, most underpowered little instance from 2003. Your production server deserves better.

The core problem autovacuum solves is “bloat.” When you update or delete a row, Postgres doesn’t physically remove it immediately (that’d be slow); it just marks it as dead. These dead rows (“tuples”) are like the empty pizza boxes piling up in the corner. They take up space, make queries slower to scan through, and if left unchecked, can lead to a table that’s 100GB on disk but only has 10GB of actual live data. Autovacuum’s job is to clean up these boxes and reclaim the space for new pizza… or data.

autovacuum_vacuum_scale_factor & autovacuum_vacuum_threshold

These two work as a team to decide when to vacuum a table. The formula is simple: Vacuum when: number of dead tuples > autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor * number of live tuples)

The defaults are threshold = 50 and scale_factor = 0.2. This means for a table with 10 million rows, autovacuum won’t even think about running a vacuum until you have 50 + (0.2 * 10,000,000) = 2,000,050 dead tuples. That’s absurd. You’re letting 2 million dead rows accumulate before cleanup starts! This is the number one reason people complain about table bloat.

For large, busy tables, you need to lower the scale_factor. A common and sane approach is to set a table-specific rule. Let’s say we have a user_events table that we know is high-churn.

ALTER TABLE user_events SET (autovacuum_vacuum_scale_factor = 0.01);

Now, for that same 10-million-row table, it will trigger at 50 + (0.01 * 10,000,000) = 100,050 dead tuples. A much more manageable level of mess. For massive tables, you might even drop the scale factor to 0.005 or 0.001 and rely more on the threshold.

autovacuum_vacuum_cost_delay

This is the big lever for controlling autovacuum’s “intrusiveness.” By default, it’s 20ms. This is part of Postgres’s cost-based vacuum delay system. The idea is that after autovacuum has done a certain amount of expensive I/O work (defined by other autovacuum_vacuum_cost_limit parameters), it will pause for the cost_delay period to let regular queries have access to the disk.

The problem? 20ms is an eternity. It makes autovacuum incredibly cautious, often so cautious that it can’t keep up with the bloat on a busy system. It’s like sending a janitor out with a teaspoon to clean a stadium.

The solution? Lower it dramatically. Setting it to 2ms or even 0 (which effectively disables the cost-limiting, telling the janitor to just get on with it) is common and necessary on any modern hardware with decent IOPS.

-- In postgresql.conf
autovacuum_vacuum_cost_delay = 2ms

Don’t be scared. On good hardware, this won’t cause noticeable performance degradation for your regular workload, and it will prevent the far worse problem of a never-ending, lagging autovacuum process.

Table-Specific Tuning is Your Best Friend

The global defaults are a starting point. The real magic happens when you override them per table. A tiny, static lookup_codes table doesn’t need the same aggressive vacuuming as your massive, constantly-updated audit_log table. Use ALTER TABLE to set them appropriately.

-- For a large, heavily-updated fact table:
ALTER TABLE massive_audit_log
SET (autovacuum_vacuum_scale_factor = 0.001,
      autovacuum_vacuum_threshold = 5000);

-- For a small, mostly-read dimension table:
ALTER TABLE static_config
SET (autovacuum_vacuum_scale_factor = 0.0,
      autovacuum_vacuum_threshold = 100);

The “Workers” and “max_workers” Consideration

Remember, you can have multiple autovacuum workers running concurrently (controlled by autovacuum_max_workers, default 3). They have to share the total I/O budget defined by autovacuum_vacuum_cost_limit. If you have many large tables that need vacuuming simultaneously, you might need to increase autovacuum_max_workers and the global autovacuum_vacuum_cost_limit to allow them to actually get work done in parallel without each being throttled to a crawl. Think of it as hiring more janitors and also giving them more than one mop to share.

The key is to monitor. Use queries against pg_stat_all_tables to watch the n_dead_tup count. If it’s constantly high and growing, your autovacuum isn’t keeping up. Tune it. Be proactive. Your database’s performance will thank you.