32.4 Index Strategy for Partitioned Tables
Right, so you’ve partitioned your table. Congratulations, you’ve graduated from mere mortal DBA to someone who gets to deal with a whole new class of problems. Partitioning is like getting a multi-tool: it’s incredibly powerful, but if you don’t know which blade to use, you’ll just end up hurting yourself. And the sharpest blade in this kit is your indexing strategy. Get it wrong, and you’ve built a Rube Goldberg machine that’s slower than the single table you started with.
The golden rule, the one I want you to write on a sticky note and put on your monitor, is this: Your partition key must be part of every unique index and primary key. The database needs this to enforce uniqueness across partitions, not just within them. If you don’t do this, the poor thing has no idea which partition to check for a duplicate value. It would have to scan every single one, turning your beautiful partition design into a horrifying table-wide lock.
Let’s say we partition a sales table by DATE on the sale_date column. Here’s how you must define the primary key:
-- This will fail spectacularly. Don't do this.
CREATE TABLE sales (
sale_id BIGSERIAL,
sale_date DATE NOT NULL,
amount NUMERIC(10,2)
) PARTITION BY RANGE (sale_date);
-- This is the way.
CREATE TABLE sales (
sale_id BIGSERIAL,
sale_date DATE NOT NULL,
amount NUMERIC(10,2),
PRIMARY KEY (sale_id, sale_date) -- Notice the partition key is included!
) PARTITION BY RANGE (sale_date);
The designers made this rule, and while it feels a bit pedantic, they’re right. Uniqueness is a global contract, and the partition key is the only way to efficiently map a row to its partition to uphold that contract.
Global vs. Local Indexes
This is where the real decision-making happens. You have two choices, and picking the wrong one will haunt your performance.
A local index is created on each individual partition. Think of it as giving each partition its own private index. When you create one on the main table, PostgreSQL automatically propagates it to all current and future partitions. This is its killer feature.
-- This creates a separate index on 'amount' for every partition
CREATE INDEX ON sales USING btree (amount) LOCAL;
The huge advantage? Maintenance is a dream. When you DROP a partition, you just… drop it. The index on that partition gets dropped with it. No massive index rebuilds on your multi-terabyte table. The downside? If you query WHERE amount > 1000 without providing a sale_date, the planner has to probe every single local index on every partition. This is called a “partition-wise index scan,” and it’s about as efficient as it sounds—which is to say, not very. It’s like having to check every book in a library’s fiction section instead of just grabbing the one you need from the shelf.
A global index is the traditional index you’re used to: a single, large index that spans every partition. PostgreSQL doesn’t natively support them in the way other DBs do. The way you fake it? You create an index on the main table that doesn’t include the partition key.
-- This creates a single index across all partitions (use with caution!)
CREATE INDEX ON sales USING btree (sale_id);
The advantage? A query for WHERE sale_id = 12345 can blast directly into this one index to find the row, regardless of what partition it’s in. The catastrophic disadvantage? Any DROP PARTITION or ATTACH PARTITION operation requires a full, blocking rebuild of this massive global index. It completely negates one of the main benefits of partitioning (fast data purging).
The Query Planner and Partition Pruning
Your index strategy is useless if the query planner doesn’t first perform partition pruning. This is the magic trick where the planner looks at your WHERE clause, realizes only partitions X, Y, and Z could possibly contain the data, and ignores all the others.
This is why you always, always filter on your partition key if you can. An index on sale_date is almost entirely redundant because we’re already using it to prune. The real power comes from indexing other columns within the pruned set.
-- Good: Uses partition pruning, then uses a local index on 'customer_id'
SELECT * FROM sales
WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31'
AND customer_id = 44501;
-- Terrible: Can't prune, so must scan every local index on every partition
SELECT * FROM sales WHERE customer_id = 44501;
The first query is lightning. The second query will make you cry. The moral of the story is that your application code must be as partition-aware as your database design.
Best Practices and Pitfalls
Index What’s Left: After pruning, you’re left with a much smaller set of data. Index the columns you commonly filter on within that partition. On a
salestable partitioned by month, you might indexcustomer_idlocally. You’re indexing for the query, not for the table.Beware of OR: The planner often struggles to prune with
ORconditions.WHERE sale_date = '2023-01-01' OR customer_id = 55might prevent pruning onsale_date. Rewrite these queries with aUNION ALLif performance is critical.Constraints Are Your Friend: Define
CHECKconstraints on your child tables. Besides being good practice, they give the planner more information to work with during pruning.The Primary Key Rule is Non-Negotiable: I said it before, I’ll say it again. You will forget this once. You will then spend three hours debugging why your insert is timing out before you remember. Don’t be me.
Partitioning isn’t a “set it and forget it” operation. It’s a contract between you, your application, and the database. You promise to structure your queries wisely, and in return, it gives you performance and manageability that flat tables can only dream of. Hold up your end of the bargain.