34.4 Logical Replication: Publications and Subscriptions
Right, so you’ve outgrown streaming replication. You need to replicate only a subset of your data, or maybe you’re doing a major version upgrade without downtime. Welcome to logical replication, the grown-up version of “just copy the whole data directory.” Instead of blindly shipping every bit and byte, it streams a log of the actual data-changing operations (INSERT, UPDATE, DELETE) from one database to another. It’s smarter, more flexible, and consequently, a bit more hands-on.
The core of this system is built on two concepts: the publication (what you publish) and the subscription (what you consume). It’s a publisher-subscriber model, and honestly, it’s one of the cleanest abstractions PostgreSQL has added in years.
Publications: What You’re Selling
Think of a publication as a curated list of tables you want to ship out. You create it on the source database (we call this the publisher) and add tables to it. The critical thing to understand here is that a publication is transaction-aware. If you update five tables in a single transaction, the entire set of changes is applied on the subscriber in one transaction, maintaining consistency. This isn’t some janky trigger-based system; it’s built into the core.
Here’s how you set one up. Let’s say you have an orders table and an order_items table, and you want to ship them together.
-- On your publisher node (the source database)
CREATE PUBLICATION pub_orders FOR TABLE orders, order_items;
Boom. You’re now a publisher. You can also get lazy and publish everything with FOR ALL TABLES, but I wouldn’t recommend it unless you have a very specific reason. Be explicit. It saves you from accidentally replicating that admin_users table to your analytics subscriber.
Now, the most common “gotcha” right out of the gate: the tables in your publication must have REPLICA IDENTITY defined. This is how PostgreSQL knows how to identify old rows for UPDATE and DELETE operations. If you just do a standard CREATE TABLE, it defaults to REPLICA IDENTITY DEFAULT, which uses the primary key. That’s usually fine. But if you try to replicate a table without a primary key and you issue a DELETE or UPDATE, the subscriber will have a full-on meltdown because it won’t know which row to target.
You can check and set it:
-- Check the current replica identity for a table
SELECT relreplident FROM pg_class WHERE oid = 'orders'::regclass;
-- Returns 'd' for DEFAULT, 'f' for FULL, etc.
-- If you have a table without a PK but with a unique key, you can use that
ALTER TABLE some_table REPLICA IDENTITY USING INDEX some_unique_index;
-- The nuclear (and performance-impacting) option for tables with no unique identifiers
ALTER TABLE some_awful_table REPLICA IDENTITY FULL;
Using FULL makes it use the entire old row for identification, which is expensive. Just add a primary key. Seriously.
Subscriptions: Tuning In
The subscription is the other half of the equation. You create this on the target database (the subscriber). Its job is to connect to the publisher, grab the changes from the publication, and apply them locally.
Here’s the magic:
-- On your subscriber node (the target database)
-- First, ensure the tables exist. Their structure must match *exactly*.
CREATE SUBSCRIPTION sub_orders
CONNECTION 'host=publisher_host dbname=your_db user=rep_user password=secret'
PUBLICATION pub_orders;
This creates a subscription, but it also kicks off an initial sync. This is a crucial detail: by default, PostgreSQL does a copy of the existing table data on the publisher. It doesn’t just start streaming from this moment forward. It makes the subscriber consistent and then starts the continuous streaming. You’ll see a temporary CREATE TABLE ... AS SELECT process on the publisher while this happens for each table.
Now, for the love of all that is holy, do not use a superuser for the replication connection. Create a dedicated user with just the permissions it needs:
CREATE ROLE rep_user WITH LOGIN REPLICATION PASSWORD 'secret';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO rep_user;
-- Also ensure the user has USAGE privilege on the schema
The Nitty-Gritty: What to Watch For
This isn’t magic fairy dust. You have to manage this system.
- Schema Changes: This is the big one. Logical replication does not replicate DDL (like
ALTER TABLE). If you add a column to the table on the publisher, you must add it to the subscriber manually and in the same transaction. If you don’t, the subscription will break the moment it tries to replicate a row with the new column. It’s a pain point, and you need a process for it. - Performance: The apply process on the subscriber is single-threaded. A single large transaction or a very busy table can cause replication lag. Monitor
pg_stat_subscription. - Monitoring: Don’t just set it and forget it. Check the health of your subscriptions.Look for
SELECT * FROM pg_stat_subscription;apply_lag. If it’s growing, something’s wrong. - Conflict Management: If an application writes directly to the subscriber, it can create a conflict (e.g., trying to INSERT a row with a primary key that already exists from the publisher). The subscription will stop. You can set
max_logical_replication_workersand adjustlogical_replication_conflictsto auto-resolve some, but it’s better to just treat the subscriber as read-only unless you truly know what you’re doing.
Logical replication is incredibly powerful, but it trades the “it just works” simplicity of streaming replication for a “you are now the manager” level of control. Use it when you need its flexibility, but respect its complexity.