32.3 Partition Pruning: How the Planner Eliminates Irrelevant Partitions
Alright, let’s get into the real magic trick: partition pruning. Or, as I like to call it, “the optimizer not being a complete idiot for once.”
Here’s the deal. You’ve gone through all the effort of splitting your billion-row table into a few dozen (or hundred) smaller, more manageable partitions. This is fantastic, but it’s all for nothing if every time you run a simple WHERE query, the database still goes and checks every single partition. That would be like having a filing cabinet with labeled drawers for “A-C”, “D-F”, etc., and then tearing through every single drawer just to find “Aaron A. Aaronson’s” file. You’d fire that intern. The query planner is (usually) smarter than that intern.
Partition pruning is the process where the planner looks at your query’s WHERE clause, looks at your partition boundaries, and says, “Oh, I only need to look in these partitions. The rest of you can take a nap.” It’s the feature that makes partitioning worth the administrative overhead. Without it, you’ve just built a Rube Goldberg machine that makes your life more complicated.
How the Planner Peeks Inside Your Partitions
The planner isn’t psychic. It can’t just guess. Pruning works because the partition key is a first-class citizen in the table’s metadata. When you create a partition, the system meticulously records the bounds of each one. For a range partition on a sale_date column, it knows Partition sales_2023_12 holds dates from '2023-12-01' to '2023-12-31'.
When your query comes in, like:
SELECT * FROM sales WHERE sale_date = '2023-12-15';
The planner parses it, sees the filter on sale_date, and cross-references that value with its internal catalog of partition bounds. It instantly knows that '2023-12-15' can only exist in the sales_2023_12 partition. It then builds a query plan that completely ignores all other partitions. You can see this beauty in action by using EXPLAIN. Look for key phrases like Seq Scan on sales_2023_12 and, more importantly, the glorious line:
... Filter: (sale_date = '2023-12-15'::date)
And the absence of any mention of the other partitions. That’s pruning working perfectly.
The Limits of the Planner’s Clairvoyance
Now, don’t get too excited. The planner is smart, but it’s not a mind reader. It can only work with what you give it. If your WHERE clause doesn’t involve the partition key directly, all bets are off.
This is the most common pitfall. Imagine this query:
SELECT * FROM sales WHERE product_id = 44521;
If product_id isn’t your partition key, the planner has absolutely no idea which partition might contain product_id 44521. It has to go knocking on every single partition’s door. This will perform worse than a single, non-partitioned table because of all the overhead of opening multiple relations. You partitioned for query performance, and you just shot yourself in the foot. The rule is simple: Your common, performance-critical query filters MUST be on or aligned with your partition key.
It also struggles with some more complex expressions. For example, using a function on the partition key can blindfold the planner.
SELECT * FROM sales WHERE EXTRACT(YEAR FROM sale_date) = 2023;
While it’s obvious to you and me that this needs all of 2023’s partitions, the planner often can’t reverse-engineer the function to understand that. It’s safer to be explicit:
SELECT * FROM sales WHERE sale_date >= '2023-01-01' AND sale_date <= '2023-12-31';
This gives the planner the raw values it needs to compare against the partition bounds.
The Joining (and Pruning) Dance
This is where it gets really cool. Pruning doesn’t just happen on the main table; it can propagate through joins. Let’s say you have a sales table partitioned by sale_date, and you join it to a tiny products table.
EXPLAIN ANALYZE
SELECT s.*
FROM sales s
JOIN products p ON s.product_id = p.id
WHERE p.category = 'Electronics'
AND s.sale_date BETWEEN '2023-11-01' AND '2023-11-30';
A sophisticated planner will execute this by first scanning the products table to find all IDs in the ‘Electronics’ category. But then, crucially, before it goes to the massive sales table, it will combine that list of product IDs with the date range. It knows the date range allows it to prune down to just the November 2023 partition. So the final scan might be: “For the product IDs we found, scan only the sales_2023_11 partition for matching rows.” This combination of partition pruning and join filtering is how you achieve truly legendary query performance on massive datasets.
The key takeaway? Partition pruning is brilliant, but it’s a tool, not a guarantee. You have to structure your queries to give it a fighting chance. When you do, the performance gains aren’t just incremental; they’re revolutionary. It’s the difference of searching a single folder on your desktop versus searching the entire company’s network drive.