29.6 Indexes on Materialized Views for Fast Reads
Right, so you’ve built a materialized view. Congratulations, you’ve essentially told your database, “I’m tired of you recalculating this complex join every five minutes; just save the answer to a table and let me read from that.” It’s a brilliant move for read performance. But here’s the thing: a materialized view is, at its core, a table. And what do we do with tables that we query heavily? We index them. Leaving a materialized view unindexed is like buying a sports car and then putting cheap, bald tires on it. You’re not getting the performance you paid for with all that expensive REFRESH computation.
The magic of an index on a materialized view is that it’s precomputed. You’re not just speeding up a query; you’re speeding up a query against a dataset that’s already a cached result. It’s performance squared.
The No-Brainer: Indexing Your Primary Key
Most materialized views should have a primary key. It’s often the most logical unique identifier for the precomputed row. If your materialized view is built from a GROUP BY or includes a DISTINCT, you already have a natural key. Declaring it as a primary key gives you a unique, non-null, automatically indexed constraint. It’s the first thing you should do.
CREATE MATERIALIZED VIEW public.order_summary AS
SELECT
customer_id,
date_trunc('month', order_date) as order_month,
COUNT(*) as order_count,
SUM(order_total) as total_revenue
FROM orders
GROUP BY customer_id, date_trunc('month', order_date);
-- The moment of truth: add the primary key
ALTER MATERIALIZED VIEW order_summary ADD CONSTRAINT pk_order_summary PRIMARY KEY (customer_id, order_month);
This creates a unique B-tree index on (customer_id, order_month), which is perfect for point queries looking for a specific customer’s activity in a specific month. The database can zip right to that row.
Indexing for Your Actual Query Patterns
The primary key is the easy win. The real art is in looking at how you actually query this materialized view. Don’t just throw indexes at it; that will only slow down your REFRESH. Be a sniper, not a shotgunner.
Are you always filtering on total_revenue to find your biggest customers? That’s a job for a B-tree index. Are you searching for orders within a date range? Another B-tree, probably on order_month. The rules are the same as for regular tables.
-- Example: We often need to find top spenders in a given month.
CREATE INDEX idx_order_summary_month_revenue ON order_summary USING btree (order_month, total_revenue DESC);
-- This index would make this query scream:
SELECT * FROM order_summary
WHERE order_month = '2023-10-01'
ORDER BY total_revenue DESC
LIMIT 10;
The index is built to match the exact WHERE and ORDER BY clause. The database can jump to October 2023 and then just read the top 10 rows right off the index in the correct order. It’s beautiful.
The Gotcha: Indexes and the REFRESH
Here’s the part the manual often glosses over: indexes are a double-edged sword. Every single time you run REFRESH MATERIALIZED VIEW, the database isn’t just repopulating the table data. It’s also rebuilding every… single… index on that view from scratch.
Think about that. If you have five elaborate indexes, a REFRESH does the work of repopulating the entire dataset plus the work of building five indexes on that new dataset. This can turn a moderately expensive refresh into a painfully long one.
The best practice? Index strategically. Add indexes one by one, monitoring their impact on query speed versus refresh time. If you have a massive materialized view that needs to be refreshed frequently, you might need to be very conservative with your indexing. Sometimes, a slightly slower read that happens 10,000 times a day is better than a REFRESH that times out and leaves your data stale.
The CONCURRENTLY Trade-Off
You can use REFRESH MATERIALIZED VIEW CONCURRENTLY to allow reads to continue while the view refreshes. This is fantastic for uptime. But it has a major implication for indexes: to use CONCURRENTLY, the materialized view must have at least one unique index.
Why? The concurrent refresh works by diffing the new data with the old data. It creates a temporary delta file, and then it applies those changes (inserts, updates, deletes) to the existing materialized view. To perform updates and deletes correctly, it must be able to uniquely identify each row. Without a unique index, CONCURRENTLY simply won’t work. It’s not a suggestion; it’s a hard requirement. So if you plan on using concurrent refreshes (and you probably should in any production environment), that primary key isn’t just a good idea—it’s mandatory.
The trade-off is that the concurrent refresh is slower and more expensive than a full refresh because of all this extra work to compute the diff. You’re trading raw speed for availability. It’s almost always a trade worth making.