34.1 WAL-Based Replication: The Fundamentals
Right, let’s talk about the beating heart of any serious PostgreSQL setup: replication. Forget the marketing fluff; this is how you turn a single, lonely database server into a resilient, scalable system. And it all starts with the Write-Ahead Log, or WAL. If your database is a novel, the WAL is the continuous, unbroken stream of every single edit the author ever made. We don’t just copy the final book; we stream the entire writing process. This is the fundamental concept behind WAL-based replication, and it’s brilliantly simple and robust.
How the WAL Becomes a Data Stream
Think of the WAL not as a static file but as an endless, append-only stream of change events. Every INSERT, UPDATE, or DELETE gets broken down into a low-level description of what blocks on disk were changed and how. The primary server’s job is to just keep writing these WAL records. The replica’s job is to suck them up and relentlessly replay them in the exact same order.
This is why we call it physical replication. We’re not replaying the SQL statement UPDATE users SET status = 'awesome' WHERE id = 42;. We’re replaying the instruction: “At location X in the table’s data file, change bytes 1234-5678 to look like this.” The replica’s data files become a byte-for-byte identical copy of the primary’s. This has a huge advantage: it’s incredibly reliable and fast. There’s no ambiguity. The replica is a perfect clone.
The Magical pg_hba.conf and a Replication User
Before a replica can even dream of connecting, the primary needs to know it’s allowed to exist. This is a classic “it’s always the firewall” moment. You need to configure the pg_hba.conf file on the primary to allow replication connections from your replica server, and you need a dedicated user for the job.
Don’t use your superuser. Create a user whose sole purpose is to be a glorified photocopier.
-- On the Primary Server
CREATE USER replicator WITH REPLICATION LOGIN ENCRYPTED PASSWORD 'a_strong_password_please';
Then, in pg_hba.conf, add a line that allows this user from your replica’s network:
# pg_hba.conf on the Primary
host replication replicator 192.168.1.100/32 scram-sha-256
This tells PostgreSQL: “Allow connections for the replication database (it’s a pseudo-database), user replicator, from the IP 192.168.1.100, using the secure password method.” Reload the config (pg_ctl reload or SELECT pg_reload_conf();) and you’re set.
The recovery.conf (or postgresql.auto.conf) That Makes It Work
Here’s where the designers, in their infinite wisdom, changed the rules on us. In PostgreSQL 12 and above, the beloved recovery.conf file was merged into postgresql.conf. It’s cleaner, but I’ll admit I miss the old, dedicated file. The replica’s configuration needs to point to the primary and say, “I am the clone, feed me.”
You’ll create a standby.signal file in the replica’s data directory to tell it to boot into standby mode. Then, in postgresql.conf (or better, postgresql.auto.conf to keep your settings separate), you set the connection information:
# In the Replica's postgresql.auto.conf
primary_conninfo = 'host=primary-server-host port=5432 user=replicator password=a_strong_password_please'
This is the single most important line. It’s the replica’s GPS coordinates for finding its master. The restore_command is also critical for initial catch-up and for archiving, but for a simple streaming setup, the primary_conninfo is the star of the show.
Why This Is So Darn Resilient
The beauty of this streaming approach is its statelessness. The replica just asks: “What’s the next WAL record?” If the replica falls behind, gets disconnected for an hour, or crashes, it just wakes up, reconnects, and says, “Okay, where were we?” It will then stream and apply all the WAL it missed. The primary keeps a small history of past WAL files (controlled by wal_keep_size) to support this. As long as the replica reconnects before the primary cycles over and deletes the old WAL it needs, it can catch up automatically. This makes it remarkably tolerant of network flakiness.
The Obvious Pitfall: WAL Bloat and wal_keep_size
Here’s the first “questionable choice” you need to manage. The primary is a ruthless minimalist. Its default setting is to only keep enough WAL files to ensure its own integrity, not to babysit lagging replicas. The wal_keep_size parameter defaults to a measly 0 (in older versions, it was wal_keep_segments with a default of maybe 10). If your replica is down for a few hours and the primary is very busy, it will merrily delete the old WAL files the replica needs to catch up. When the replica reconnects, it will panic because its required starting point is gone.
The solution is to either:
- Increase
wal_keep_sizeon the primary to a value large enough to cover any expected replica downtime (e.g.,wal_keep_size = 10GB), or - (The better practice) Integrate a WAL archiving system where the primary archives WAL files to a durable storage like S3 or a network drive, and configure the replica’s
restore_commandto fetch them from there as a backup. This is your safety net. Relying solely onwal_keep_sizeis like trusting a goldfish to remember a secret.