35.3 pg_restore: Selective and Parallel Restore

Right, so you’ve got a backup. Congratulations. That puts you ahead of roughly half the people I’ve met in this industry. But a backup is just a latent disaster until you prove you can use it. That’s where pg_restore comes in. Think of pg_dump as you carefully packing your entire house into labeled boxes. pg_restore is you, hopefully not in a panic, unpacking it. And unlike a real move, you get to be incredibly choosy about what comes out of the truck and in what order.

The first thing to understand is that pg_restore is the counterpart to pg_dump’s custom -Fc format. If you dumped to a plain SQL file (-Fp), you’re stuck feeding it to psql and watching it babble SQL at your database in a single, agonizingly slow thread. The custom format is a tarball for your database, and pg_restore is the tar command that can manipulate it. It knows the structure of the archive, which lets us do some truly powerful things.

Why the Custom Format is a Game-Changer

The magic of the -Fc format isn’t that it’s smaller (though it is, because it’s compressed), but that it’s structured. The archive contains a table of contents listing every object—every table, every index, every function—along with its definition and its data. Because pg_restore can read this TOC, it can perform surgical strikes on your data. You can:

Restore just one table’s data, or just its schema.
Restore just the triggers, but not the functions they call.
Exclude that one massive, pointless logging table that the previous dev thought was a good idea.
Juggle ownership of objects on the way in.

Trying to do this with a plain SQL dump is like trying to remove one specific ingredient from a baked cake. With the custom format, you’re working with the raw batter.

The Art of the Selective Restore

Let’s say you have a monster of a database named prod, but you only need to restore the schema for a single table, public.users, and its data into your staging database staging.

# First, just see what's in the backup. Always do this first.
pg_restore -l /backups/prod-2023-10-27.dump > list.txt

# This creates a list file. Open it. You'll see things like:
# ;;
# ; Archive created at [timestamp]
# ;     dbname: prod
# ;     TOC Entries: 1420
# ;     Compression: -1
# ;     Dump version: 1.14-0
# ;     Format: CUSTOM
# ;
# ; Selected TOC Entries:
# ;
# 3873; 1262 16384 DATABASE - prod postgres
# 2; 3079 12518 EXTENSION - plpgsql
# 3; 3079 12519 COMMENT - EXTENSION plpgsql
# ... thousands more lines ...

# Now, let's create a new list that ONLY includes the 'users' table.
# The ';' at the start of a line comments it out, meaning it won't be restored.
pg_restore -l /backups/prod-2023-10-27.dump | grep -E '(TABLE DATA public users|TABLE public users)' > users_only.list

# Now, perform the restore using this list.
pg_restore -d staging \
           -L users_only.list \
           /backups/prod-2023-10-27.dump

Maybe you want to restore *everything except the audit_log table because it’s 500GB of sadness. Easy.

pg_restore -l /backups/prod.dump | grep -v 'TABLE DATA public audit_log' > no_audit.list
# Also exclude its schema if you don't want the empty table either!
pg_restore -l /backups/prod.dump | grep -v 'TABLE public audit_log' > no_audit.list

pg_restore -d staging -L no_audit.list /backups/prod.dump

Cracking the Whip with Parallel Restore

Here’s the best part: pg_restore can use multiple jobs to parallelize the restore. This is primarily a massive win for restoring data. The data for each table is a separate chunk in the archive, so pg_restore can have multiple worker processes each loading a different table’s data simultaneously. For a large database, this can cut restore times from hours to minutes.

The rule of thumb is to set the -j (jobs) flag to the number of CPU cores on your target machine. Don’t just set it to 100 and expect magic; you’ll drown the system in context switches.

# This will use 4 parallel workers to load data.
pg_restore -d new_db -j 4 -Fc /backups/prod.dump

Crucial Pitfall: Parallel restore is fantastic for data, but the schema is always restored by a single process first. This is the right thing to do, as it prevents workers from trying to create the same object simultaneously and blowing up the whole operation. The -j flag only affects data loading. Also, be aware that if you have pre-data and post-data items (like indexes and FK constraints), they are restored in separate, single-threaded phases. You’ll see pg_restore create all the tables (fast), then blast data into them in parallel (very fast), and then slowly build all the indexes one-by-one (painfully slow). There’s no way around that last part; it’s just how it works.

The –no-owner Flag: Your Best Friend for Cross-Environment Restores

This is the number one “gotcha” that trips people up. You dumped from a production database where everything is owned by prod_superuser. You’re trying to restore to a staging server where that role doesn’t, and shouldn’t, exist. If you don’t use --no-owner, the restore will fail spectacularly when it tries to assign ownership to a role it can’t find.

The --no-owner flag tells pg_restore to make the connected user (usually postgres or your admin user) the owner of all objects in the restored database. It’s almost always what you want when moving between environments.

# Do this to avoid a ownership permission nightmare.
pg_restore -d staging --no-owner -j 4 -Fc /backups/prod.dump

Combine this with --no-privileges if you’re also moving between systems with different role structures and just want the bare objects.

So, to wrap it up: always use the custom format (-Fc). Use -l to see what you’re working with. Use -L to be surgical. Use -j to get your weekend back. And for the love of all that is holy, use --no-owner unless you’re restoring to an identical clone of your source system. Now go practice. I’ll wait.