38.2 work_mem: Per-Sort and Per-Hash Operation Memory
Alright, let’s talk about work_mem. This is one of the most important, and most frequently misunderstood, knobs in the entire postgresql.conf file. In simple terms, work_mem is the maximum amount of memory a single operation can use for sorting or building hash tables before it spills over to disk. Think of it as the budget for a single worker’s “thinking space” for one of these tasks.
The crucial thing to get into your head right now is its per-operation nature. This isn’t per connection, and it’s certainly not global. A single complex query can, and often will, have multiple sort and hash operations. Each one of those little jerks gets its own work_mem budget. A query with five sorts gets five slices of the work_mem pie. This is why you can absolutely crater a server by setting this value too high. It’s a classic rookie mistake. You think, “I have 64GB of RAM, I’ll give each backend 1GB for sorts!” and then ten concurrent queries each decide to run five parallel sorts and suddenly your process table is trying to allocate 50GB of RAM and the OOM-Killer starts eyeing your processes like they’re a tasty snack.
How It Works (And How It Breaks)
When a sort or hash operation starts, the executor says, “Cool, I have work_mem bytes to work with.” It tries to do everything in memory (using a quicksort for sorts, for example). If the data it’s processing is larger than that, it has to spill to disk. It breaks the data into sorted “runs” on disk and then performs a merge sort. This is, as you can imagine, horrifically slower than doing it in RAM. Disk is the enemy of performance.
You can see this in action. Let’s run a simple sort and watch it with EXPLAIN ANALYZE.
SET work_mem = '1MB'; -- Pathetically low for demonstration purposes
EXPLAIN ANALYZE
SELECT * FROM my_big_table ORDER BY some_column;
Look at the output. If it’s a slow sort, you’ll see something like:
Sort Method: external merge Disk: 10240kB
That external merge Disk is Postgres politely telling you your work_mem setting is a joke and it had to write 10MB of data to temp files. Now, let’s give it a real budget.
SET work_mem = '32MB';
EXPLAIN ANALYZE
SELECT * FROM my_big_table ORDER BY some_column;
Now you should see:
Sort Method: quicksort Memory: 28384kB
Victory! It stayed entirely in memory and was orders of magnitude faster.
Setting It Correctly (Without Blowing Up Your Server)
So what’s the right value? There isn’t one single number. It’s a balancing act. You need to find a value that is:
- Large enough to handle 90% of the sorts/hashes your common queries generate in memory.
- Small enough that
work_mem * max_connections * average number of operations per querydoesn’t exceed your available RAM.
A good starting point for a general-purpose OLTP system is between 4MB and 32MB. For a data warehouse with huge, infrequent queries, you might go much higher (256MB or more), but you must be vigilant about concurrency.
The best practice is to set it globally to a conservative, safe value in your postgresql.conf (e.g., work_mem = 16MB) and then override it per-session for specific queries that you know need more. This is the way.
-- In your application code, for that one big report:
BEGIN;
SET LOCAL work_mem = '256MB'; -- Scoped to just this transaction
SELECT ... -- your big ugly query here;
COMMIT;
This is safe, surgical, and doesn’t risk your entire server’s stability.
The Hash Operation Wildcard
Don’t forget about hashes! This setting also governs hash tables used for IN() clauses, DISTINCT, and most importantly, Hash Joins. A hash join builds a hash table on one of the joined tables. If that table is large, it needs a lot of work_mem. If it doesn’t get it, the hash join will bail to disk and performance will look like a heart rate monitor for a patient who’s just given up.
SET work_mem = '1MB';
EXPLAIN ANALYZE
SELECT * FROM big_table a JOIN big_table b ON a.id = b.id;
-- Look for "Hash" in the plan and weep at the "Disk" usage.
SET work_mem = '64MB';
EXPLAIN ANALYZE
SELECT * FROM big_table a JOIN big_table b ON a.id = b.id;
-- Should be entirely in-memory and blissfully fast.
The takeaway? work_mem is a precision tool, not a blunt instrument. Set it low globally, boost it for the queries that deserve it, and always, always keep an eye on the EXPLAIN ANALYZE output to see if you’re winning the battle against the disk.