35.1 pg_dump: Logical Backups in SQL and Custom Formats
Alright, let’s talk about pg_dump. This is your Swiss Army knife for logical backups. It doesn’t copy the data files directly; instead, it connects to the database like any other client and dumps out the SQL commands needed to reconstruct your database—schema, data, roles, the whole shebang—into a text file. It’s perfect for moving between major versions, migrating to different hardware, or just having a nice, human-readable SQL script to cry over when things go wrong.
The beauty of a logical backup is its portability. That .sql file is basically a recipe. You can feed it to psql to rebuild the database on any machine that runs PostgreSQL, regardless of architecture. You can even open it in a text editor and surgically extract that one table you accidentally dropped. Try doing that with a raw file system copy.
The Two Major Flavors: Plain SQL and Custom
You’ve got two main output formats, and picking the right one is your first critical choice.
The --format=plain option (which is the default) gives you that massive, glorious SQL file. It’s simple and universally usable. The --format=custom option, on the other hand, creates a highly compressed, proprietary-format archive that only pg_restore can read. It’s not human-readable, but it’s faster and enables some killer restore options you just don’t get with plain SQL.
Here’s the basic incantation for a plain SQL dump of a single database:
pg_dump -U your_username -h localhost -p 5432 my_database > my_database_backup.sql
Pro tip: Never, ever run this as the postgres OS user without specifying a -U. It will “work,” but it’ll connect via peer authentication, and the resulting SQL file will be owned by root if you used sudo, creating a hilarious and utterly preventable permission nightmare for later. I speak from experience.
For the custom format, which I almost always prefer for anything serious:
pg_dump -U your_username -Fc my_database > my_database_backup.custom
Notice the .custom extension? That’s just a convention; it’s not required. But your future self will thank you for not leaving a bunch of unidentifiable binary files lying around.
Why You Might Actually Want the Custom Format
The custom format isn’t just about smaller file sizes (though it is much smaller). Its superpower is flexibility during restore. With a plain SQL dump, you’re replaying a script. It’s all or nothing, in a fixed order. With pg_restore and a custom file, you can:
- Restore just one table.
- Restore just the schema without the data (or vice-versa, with some flags).
- Change the order of object restoration, which is a godsend for untangling circular dependencies.
- Do it in parallel.
Trying to restore a single table from a 100GB plain SQL file is a special kind of hell. You’d have to use grep and sed like some kind of medieval scribe. With a custom file, it’s one command:
pg_restore -U your_username -t my_important_table -d my_new_database my_database_backup.custom
The Jobs Flag: Use It. Seriously.
This is the biggest performance win you’ll get for free. The -j or --jobs flag allows pg_dump to dump multiple tables in parallel. It only works with the directory format (-Fd) or the custom format (-Fc). It’s a game-changer for large databases.
pg_dump -U your_username -Fc -j 4 my_database > my_database_backup.custom
How many jobs should you use? A good rule of thumb is one more than the number of CPU cores you have dedicated to this task, but start with 4 and see how your system responds. The database will sound like it’s angry at you, but that’s just the sound of efficiency.
The Schema-Only Dance
One of the most common tasks is getting just the schema, usually to set up a dev environment. The -s or --schema-only flag is your friend here.
pg_dump -U your_username -s my_database > my_database_schema.sql
But be warned: by default, this does not include things like GRANT permissions and comments. PostgreSQL, in its infinite wisdom, classifies these as “blobs” (I wish I was joking). To get the full picture, you need to add --verbose which, for some bizarre historical reason, is the magic switch that tells it to include these ownership and ACL statements. It’s absurd, but you just have to know it.
pg_dump -U your_username -s --verbose my_database > my_database_schema_full.sql
The One Weird Pitfall: Serial Columns
Here’s a classic “gotcha” that has burned me more than once. Imagine a table with a SERIAL primary key. pg_dump is smart. It won’t dump the CREATE SEQUENCE and then manually set the nextval(); it’ll just dump the data and then reset the sequence’s last value to the MAX(id) on restore. This is usually perfect.
But what if you’re restoring a subset of data? What if you’re restoring just a few rows and you want to INSERT new data later? The restored sequence will be set to the max of the inserted data, which could be 10, but the actual max ID in the dumped data was 10,000. This is a problem waiting to happen. The solution is to use the --inserts flag, which dumps data as explicit INSERT statements with the actual values, including the ID. It makes the dump larger and slower to restore, but it preserves those specific values perfectly. Know your use case.