29.5 REFRESH MATERIALIZED VIEW: Full and Concurrent Refresh
Right, so you’ve built a Materialized View. Good for you. You’ve traded a bit of disk space for a massive speed boost, and you’re feeling pretty clever. But now you’ve hit the inevitable snag: the data in your source tables has changed, and your materialized view is now a beautiful, perfectly indexed monument to a past reality. It’s lying to you. This is where REFRESH MATERIALIZED VIEW comes in—the command that brings your cached snapshot back into alignment with the cold, hard truth.
The first thing to understand is that you have two main refresh strategies, and the choice between them is a classic trade-off between convenience and availability. You need to pick your poison.
The Full Refresh: The Sledgehammer
This is the straightforward, no-nonsense option. You run:
REFRESH MATERIALIZED VIEW my_materialized_view;
Behind the scenes, PostgreSQL does the following, in order:
- Locks the entire view exclusively. Nothing can read from it while this is happening.
- Completely truncates the table that underpins the view. It empties it out.
- Re-runs the entire
SELECTquery that defines the view, from scratch. - Repopulates the table with the brand-new result set.
- Releases the lock.
It’s simple, brutal, and effective. The major downside is the lock. For the entire duration of the refresh—which could be a long time if your query is complex and your dataset is large—the view is completely unavailable for reading. Any SELECT query trying to access it will be blocked, waiting patiently for that lock to be released. This is a non-starter for many production applications.
You use this when you can afford downtime, when the view is small and refreshes quickly, or during off-hours maintenance windows. It’s also your only option if your view is unpopulated or if you’ve created it with the WITH NO DATA option.
Concurrent Refresh: The Surgical Scalpel
This is where things get interesting. To allow reads to continue during the refresh, you use the CONCURRENTLY option:
REFRESH MATERIALIZED VIEW CONCURRENTLY my_materialized_view;
The magic here is that PostgreSQL avoids the long-lasting exclusive lock. Here’s the clever, multi-step dance it performs:
- It creates a temporary, behind-the-scenes copy of the materialized view with a name like
".new_12345_mv". - It populates this temporary copy by running the defining query, just like a full refresh would.
- It then takes a brief exclusive lock on the view (yes, a lock is still involved, but it’s much shorter).
- During this brief lock, it performs a transaction-safe swap: it compares the new temporary data with the old data, applies any differences, and updates the indexes. It’s essentially a super-efficient sync operation.
- The lock is released, and the temporary table is dropped.
The huge advantage is that for the vast majority of the operation—the time-consuming part of re-running the query—readers can still query the old version of the data. They only get blocked for the tiny instant where the sync happens.
But this power comes with significant strings attached, which is where everyone gets tripped up.
The Gotchas of CONCURRENTLY
First, and this is the big one, a concurrent refresh requires a unique constraint on the materialized view. The sync operation needs a way to compare rows between the old data and the new data to figure out what to insert, update, or delete. Without a unique index, it has no reliable way to tell if a row is the “same” row. If you try without one, you’ll be greeted with a wonderfully clear error: "REFRESH MATERIALIZED VIEW CONCURRENTLY cannot be executed on a materialized view without a unique index."
Let’s fix that. If you created your view without one, you need to add it:
CREATE UNIQUE INDEX ON my_materialized_view (id);
-- Now you can refresh concurrently
REFRESH MATERIALIZED VIEW CONCURRENTLY my_materialized_view;
Second, it’s more expensive. You’re essentially running the query twice (once for the temp copy) and then doing a whole data diffing operation. It uses more CPU, more I/O, and more temporary disk space. It’s objectively less efficient than a full refresh; you’re paying a performance tax for the privilege of availability.
Finally, you can’t just spam this command. If one concurrent refresh is running and you start another, the second one will wait for the first. But if you try to run a regular REFRESH MATERIALIZED VIEW (without CONCURRENTLY) while a concurrent refresh is running, it will wait—and then it will break the subsequent concurrent refresh. The state of the internal row-version tracking gets messed up. The rule of thumb is: pick one refresh type and stick with it for a given materialized view. Don’t mix them.
So, Which One Do I Use?
Your decision tree is simple:
- Can you afford a maintenance window? → Use a full refresh. It’s faster and simpler.
- Does your application require 24/7 read access to the view? → You must use
CONCURRENTLY. - Is your view small and fast to refresh? → A full refresh is probably fine, even if you can do concurrent.
- Is your view large and expensive to refresh? → You almost certainly need
CONCURRENTLYto avoid unacceptable downtime.
Think of it as a spectrum between raw speed and user accessibility. There’s no free lunch, but knowing the mechanics means you can choose the right trade-off for your specific trench.