21.5 Concurrent Index Builds: CREATE INDEX CONCURRENTLY

Right, let’s talk about CREATE INDEX CONCURRENTLY. This is the command you use when you’d rather not bring your entire application to a grinding, screeching halt while you build an index on a production table. It’s PostgreSQL’s way of saying, “I can chew gum and walk at the same time,” and for the most part, it’s shockingly good at it.

The standard CREATE INDEX command takes a table lock that blocks all writes (INSERTs, UPDATEs, DELETEs) for the entire duration of the build. On a large table, that duration can be measured in minutes or even hours. That’s a non-starter for most systems. CREATE INDEX CONCURRENTLY (let’s call it CIC because I’m tired of typing it) avoids this by taking a much weaker lock and working in multiple passes, allowing normal operations to continue almost uninterrupted.

How It Actually Works (The Gory Details)

Don’t just wave a magic wand; understand the incantation. CIC works in multiple phases, and knowing them explains why it sometimes feels slow or gets stuck.

First Pass: The command kicks off. It tells the database to build the index in a special “INVALID” state. It does this by taking a ShareLock (which blocks other DDL, but not writes) and then gets to work scanning the table. While it’s scanning, other transactions are merrily writing to the table. This means the initial scan is already a bit out of date by the time it finishes. No problem, that’s why we have more passes.

Second Pass: This is the clever, yet brutal, part. PostgreSQL now needs to catch up on all the changes that happened during the first scan. It does this by scanning the table’s change log (the visibility map and the heap) to find all the tuples that were modified since the first scan began. It’s effectively playing a brutal game of catch-up. This is often the longest phase because if the table is very busy, there’s a lot to catch up on.

Final Pass: Here, it takes a brief exclusive lock on the index itself (not the table!) to flip it from “INVALID” to “VALID.” This lock is so short you’ll rarely see it cause a blip. Once valid, the index is ready for the query planner to use.

The Code: It Couldn’t Be Simpler

Using it is trivial. You just add the magic word.

-- The "just shut up and do it" method:
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

-- You can include all the usual options, too:
CREATE INDEX CONCURRENTLY idx_orders_created_at ON orders(created_at DESC) WITH (fillfactor = 90);

The key difference is that this command returns almost immediately to your session, even though the work continues in the background. You can go get a coffee while it runs.

Where This Whole “Concurrent” Thing Falls Apart

It’s not all rainbows and unicorns. CIC has some very important trade-offs and failure modes you absolutely must know.

It’s Slower: A CIC build will almost always take more total wall-clock time and consume more CPU than a standard blocking build. It’s doing more work (multiple scans) and has to manage the change catch-up. You’re trading total runtime for availability.

It Can Fail Spectacularly (and Silently): This is the big one. If a problem occurs during the build—like a uniqueness violation, a deadlock, or you cancel the command—the index isn’t just gone. It’s left behind as an INVALID index. It’s a zombie: it consumes disk space, it shows up in \di, but it is completely useless. The query planner ignores it, and it just sits there, taunting you.

You must check for these after running a CIC. If it fails, you need to clean up the mess yourself:

-- Check for invalid indexes *immediately* after your CIC command finishes:
SELECT schemaname, indexname, indexdef
FROM pg_indexes
WHERE schemaname = 'public'
AND indexname = 'idx_users_email';

-- Look in pg_stat_progress_create_index to see if it's still running or hung.
-- If your command failed or was cancelled, DROP the invalid index:
DROP INDEX CONCURRENTLY idx_users_email; -- Note: you even drop it concurrently!

Yes, you have to drop it concurrently too. The designers were nothing if not consistent. Trying to do a regular DROP INDEX on an invalid index might just block, waiting for the failed transaction to complete. It’s a whole thing.

It Can Be Starved: On an extremely busy table with a high volume of constant updates, the second pass can theoretically never finish because it’s trying to catch a train that’s constantly accelerating. In practice, this is rare, but it’s why you might see a CIC build taking an unexpectedly long time on a hot table.

Best Practices: Don’t Be a Hero

Do It During Off-Peak Anyway: Just because you can build an index during peak traffic doesn’t mean you should. The extra load from the multiple scans and the change-catching can impact performance for other queries. Use CIC to avoid downtime, but still try to run it during a quieter period.
Watch for Lock Conflicts: While CIC avoids heavy table locks, it still needs a ShareLock. This will block other DDL commands like ALTER TABLE, DROP TABLE, or even another CREATE INDEX CONCURRENTLY on the same table. Schedule schema changes carefully.
Don’t Wrap It in a Transaction: This is crucial. CREATE INDEX CONCURRENTLY cannot run inside a transaction block. If you try, PostgreSQL will tell you to get lost. It makes sense—the whole command is a multi-transaction process itself.

BEGIN;
CREATE INDEX CONCURRENTLY idx_bad_idea ON some_table(column); -- ERROR:  CREATE INDEX CONCURRENTLY cannot run inside a transaction block
COMMIT;

So, there you have it. CREATE INDEX CONCURRENTLY is an indispensable tool for anyone managing a live Postgres database. It’s a brilliant feat of engineering that lets you keep the lights on. Just remember it’s a precision instrument, not a blunt hammer. Respect its quirks, always check for invalid indexes, and never assume the command you fired off two hours ago actually succeeded. Trust, but verify.