21.2 Partial Indexes: Indexing Only a Subset of Rows

Right, so you’ve got a table. It’s a big table. A real chonker. And you’ve wisely decided to index a column to speed things up. But here’s the thing: do you really need to index every single row? Often, the most frequent and performance-critical queries are only interested in a specific subset of your data. Indexing every row for that is like building a library catalog that includes every single book, but your patrons only ever ask for the sci-fi paperbacks published after 1985. It’s wasteful. It bloats your index, slows down writes, and frankly, it’s a bit gauche.

This is where partial indexes come in to save the day, your disk I/O, and your dignity. A partial index is precisely what it sounds like: an index built on only a part of your table, defined by a WHERE clause. It’s the most practical form of indexing since sliced bread.

The Obvious Use Case: Filtering Out the Noise

The classic example is a status column. Imagine an orders table with a status column that can be ('pending', 'processing', 'shipped', 'cancelled'). Your application is constantly polling for orders that are 'pending' to process them. The vast majority of your orders are already 'shipped', but every time you run SELECT * FROM orders WHERE status = 'pending', the database might do a sequential scan or, if you have a full index on status, it still has to traverse through countless irrelevant 'shipped' entries.

The solution is brutally elegant:

CREATE INDEX idx_orders_pending ON orders (status)
WHERE status = 'pending';

Now, this index only contains rows where status is 'pending'. It’s tiny, hyper-efficient, and when that frequent query runs, it zooms directly to the few rows it cares about. The database is smart enough to know that if your query’s WHERE clause matches the index’s WHERE clause, this lean, mean index is the perfect tool for the job.

Why This is Black Magic (And How the Planner Uses It)

You might think, “Won’t the query planner get confused and not use my fancy index if I ask for status IN ('pending', 'processing')?” And you’d be right to worry! The query planner isn’t psychic. It will only use idx_orders_pending for queries where it can prove that all the rows in the index must satisfy the query condition.

This is a crucial detail everyone misses. The index condition is a hard guarantee. The planner knows that idx_orders_pending contains only and exactly status = 'pending' rows. So:

SELECT * FROM orders WHERE status = 'pending'; -- Uses the index. Perfect.
SELECT * FROM orders WHERE status = 'shipped'; -- Cannot use the index. Correct.
SELECT * FROM orders WHERE status IN ('pending', 'processing'); -- Will not use this index.

Why the last one? Because the index doesn’t have any 'processing' rows! It would be incomplete. For that query, you’d need a different index. This isn’t a flaw; it’s the rigorous logic that makes partial indexes reliable. You’re not building a general-purpose tool; you’re building a scalpel for a very specific operation.

Beyond Equality: The Real Power Play

While the status field is the textbook example, this is where we get into the genuinely clever stuff. Partial indexes shine for complex conditions.

Use Case: Enforcing Uniqueness on a Subset. This one is so useful it feels like cheating. You have a users table. A user can have multiple email addresses, but only one can be their primary email. How do you enforce that at the database level? A regular unique constraint on (user_id, email) won’t work. The answer:

CREATE UNIQUE INDEX idx_users_primary_email ON users (user_id)
WHERE is_primary = true;

Boom. The database will now throw an error if you ever try to set two emails as primary for the same user. It’s a constraint that only applies to a subset of data, which is something you simply cannot do with a standard UNIQUE constraint. Elegant, right?

Use Case: Indexing Based on a Relationship. Let’s say you have a massive events table, but your application is almost exclusively querying events that happened in the last 30 days. Indexing all billion historical events is insane.

CREATE INDEX idx_events_recent ON events (created_at)
WHERE created_at > CURRENT_DATE - INTERVAL '30 days';

The index automatically maintains itself. As time moves forward, old events fall out of the index condition and are no longer indexed, while new events are added. It’s self-cleaning.

The Pitfalls: Where This Goes Sideways

The Moving Target: The most common mistake is using a volatile function like CURRENT_TIMESTAMP in the index definition. Don’t. The index condition is evaluated when the row is inserted or updated. A row inserted with created_at of today will be indexed. But tomorrow, the condition created_at > CURRENT_DATE - INTERVAL '30 days' is still CURRENT_DATE from when the row was inserted. The index does not automatically re-check the condition for all rows every day. The value is frozen in time for each row. For a rolling window, you need a condition based on an immutable column, like the date itself.
Planner Confusion: As we touched on, the query planner must be able to prove the index is applicable. If your query uses a function or a calculation that the planner can’t match deterministically to the index’s condition, it’ll chicken out and not use your beautiful index. Keep the conditions simple and predictable.
The False Sense of Security: You create idx_orders_pending and then write a query: SELECT * FROM orders WHERE status = 'pending' AND customer_id = 123;. Will it use the index? Maybe. It can use it to find the pending orders, but then it has to filter those for the customer. If you’re always filtering by customer_id as well, your partial index should probably include it: CREATE INDEX idx_orders_pending_customer ON orders (customer_id) WHERE status = 'pending';. Think about your entire query predicate.

The takeaway? Partial indexes are not a niche feature for edge cases. They are a fundamental tool for writing performant, storage-efficient schemas. Use them to ignore what you don’t need. Your database will thank you.