29.4 CREATE MATERIALIZED VIEW: Caching Query Results
Right, so you’ve got a query. It’s a big one. It’s joining half your database, doing some aggregates, the works. It’s slow, and you’re running it all the time. You could cache the results in an application layer somewhere, but that’s a hassle. You want the database to handle it. Enter the Materialized View, your database’s built-in, “I’ll remember this for you” feature.
A Materialized View (let’s call it an MV to save us both time) is the lovechild of a view and a table. A regular VIEW is just a saved query; every time you select from it, it runs the underlying query. An MV is different: it runs the query once, stores the actual results on disk like a table, and gives you that data back instantly. It’s a complete snapshot of the query result at the moment you last refreshed it.
Think of it this way: a view is a set of instructions for how to make a sandwich. A materialized view is the actual, pre-made sandwich sitting in the fridge. One requires work every time you’re hungry; the other is just ready to go.
The Basic Syntax: It’s Pretty Simple, Really
Here’s the gist. You take your painfully slow query and you give it a name.
CREATE MATERIALIZED VIEW public.monthly_sales_summary AS
SELECT
date_trunc('month', sale_date) as sales_month,
region,
sum(amount) as total_sales,
count(*) as number_of_orders
FROM
sales
GROUP BY
sales_month, region;
Boom. The database cranks away, executes that SELECT, and persists the rows into monthly_sales_summary. You can now query it like any other table:
SELECT * FROM monthly_sales_summary
WHERE region = 'North America'
ORDER BY sales_month DESC;
This will be blisteringly fast compared to the original query, because you’re just reading from a pre-computed pile of data. Magic.
The Big Catch: It Gets Stale
Here’s the part everyone forgets until it bites them: that MV is now a snapshot. While your main sales table is getting new orders added every minute, your monthly_sales_summary MV is frozen in time, happily serving up yesterday’s news.
This is the fundamental trade-off. You’re exchanging data freshness for speed. This is perfect for dashboarding, reporting, or any other scenario where you can tolerate slightly old data. It’s terrible for your real-time “check if this product is in stock” feature.
So, how do you update it? You refresh it.
REFRESH MATERIALIZED VIEW monthly_sales_summary;
This re-runs the original query and replaces the entire contents of the MV with the new results. It’s an atomic operation, but it’s typically a heavyweight one. It locks the MV for the duration of the refresh (though CONCURRENTLY can help, more on that in a sec), and if your underlying query is expensive, the refresh will be too.
REFRESH CONCURRENTLY: Avoiding the Lockdown
A plain REFRESH locks the materialized view. Nothing can read from it while it’s being updated. For a large MV, this can mean a multi-minute outage for your dashboard. Not great.
PostgreSQL, being the brilliant mess it is, offers a solution: REFRESH MATERIALIZED VIEW CONCURRENTLY.
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales_summary;
This magic incantation builds the new snapshot in the background, and when it’s ready, it does a nifty swap-and-drop with the old data, all while allowing reads to continue uninterrupted. It’s fantastic.
So why wouldn’t you always use it? Of course there’s a catch. Two, actually.
- It’s slower. There’s more overhead in building the new dataset concurrently and comparing it to the old one.
- It requires a Unique Key. The
CONCURRENTLYrefresh needs a way to uniquely identify each row in the MV (so it knows what to add, remove, or update). Your MV must have at least one column (or combination of columns) marked asUNIQUE. Our example above doesn’t have one, so a concurrent refresh would fail. We’d need to define it with a unique key from the start.
CREATE MATERIALIZED VIEW public.monthly_sales_summary AS
SELECT
date_trunc('month', sale_date) as sales_month,
region,
sum(amount) as total_sales,
count(*) as number_of_orders
FROM
sales
GROUP BY
sales_month, region
WITH NO DATA; -- Don't build it yet!
-- Create a unique index to enable future concurrent refreshes
CREATE UNIQUE INDEX ON monthly_sales_summary (sales_month, region);
-- Now populate it for the first time
REFRESH MATERIALIZED VIEW monthly_sales_summary;
Indexes, Storage, and The Kitchen Sink
Since an MV is stored as a table, you can (and absolutely should) add indexes to it. You’re building this thing for speed, so index the columns you’ll be filtering on in your WHERE and JOIN clauses. You can also use WITH to set storage parameters, just like a table, if you need to tweak things like fillfactor.
The WITH NO DATA clause in the creation syntax is a pro move. It lets you define the MV, build its indexes before you populate it with data, which can often be much faster than creating indexes on a giant existing dataset.
When To Use This Power (And When To Run Away)
Use an MV when:
- Your query is slow and read-heavy.
- The data doesn’t need to be real-time (e.g., end-of-day reports, dashboards).
- The source tables aren’t updated so frequently that a refresh would be constant chaos.
Avoid it like the plague for:
- Real-time, up-to-the-second data needs.
- Heavily written-to tables where you’d have to refresh the MV every five seconds, completely negating the performance benefit.
- Simple queries that are already fast. You’re just adding complexity for no reason.
The materialized view is a surgical tool. It solves a very specific performance problem caused by expensive, repetitive reads. Wield it wisely, refresh it conscientiously, and enjoy the glorious speed.