4.5 pg_lsn: Log Sequence Numbers for WAL Positions

Right, so you’ve decided to get serious about PostgreSQL’s Write-Ahead Log (WAL). Good for you. It’s the secret sauce behind almost every “how does Postgres do that?!” feature, from crash recovery to replication. And to understand the WAL, you need to understand its address system: the Log Sequence Number, or pg_lsn. This isn’t your average integer; it’s a data type specifically designed to talk about points in the transaction log, and it’s far more elegant than just throwing a BIGINT at the problem.

A pg_lsn value represents a specific byte position in the WAL. It’s displayed as a pair of hexadecimal numbers separated by a slash, like 1/345F82E0. The first number is the log file segment and the second is the byte offset within that segment. This format isn’t just for show; it’s a human-readable clue about how the WAL is actually structured under the hood. But don’t get too attached to the format—PostgreSQL is smart enough to let you compare, subtract, and work with these values directly.

Why We Have a Special Type for This

You might be thinking, “Couldn’t I just store this as a BIGINT or a TEXT field?” Well, you could, but you’d be making your life infinitely harder and your code significantly uglier. By making it a first-class data type, Postgres gives you built-in operators, functions, and sanity checks. The most important thing it prevents is the classic “apples and oranges” error: you can’t accidentally compare a timestamp to a WAL position. The type system slaps your hand before you can do something truly disastrous. It also handles all the messy business of the underlying 64-bit comparison and arithmetic for you, which is non-trivial when you’re dealing with a string that looks like 0/FFFFFFFF.

How to Work With pg_lsn

The beauty of pg_lsn is that it follows common sense. You can compare two LSNs with the standard operators (<, >, =, etc.) to see which event happened first in the log. You can even subtract them to get the amount of WAL data (in bytes) generated between two points—a crucial metric for monitoring replication lag.

Let’s see some real SQL. First, let’s find out our current write position in the WAL. This is the most important LSN you’ll constantly be checking.

SELECT pg_current_wal_insert_lsn() AS current_insert_lsn;

 current_insert_lsn 
--------------------
 1/345F82E0

Now, let’s see how far our replica(s) have caught up. This is the magic behind monitoring replication health.

SELECT pg_current_wal_insert_lsn() - pg_last_wal_receive_lsn() AS receive_lag_bytes,
       pg_current_wal_insert_lsn() - pg_last_wal_replay_lsn() AS replay_lag_bytes,
       pg_last_wal_replay_lsn() AS last_replay_lsn;

 receive_lag_bytes | replay_lag_bytes | last_replay_lsn 
-------------------+------------------+-----------------
                0 |                0 | 1/345F82E0

In this ideal scenario, both receive_lag_bytes and replay_lag_bytes are 0, meaning the replica has received and applied every byte the primary has produced. If you see those numbers climbing, you know you’re falling behind.

The Biggest Pitfall: WAL Wraparound

Here’s the part everyone forgets until 3 AM when their replica breaks: LSNs are not forever. The WAL is a circular buffer, not an infinite tape. Once it fills up, it starts overwriting the oldest segments. This is called wraparound. If you have an old pg_lsn value—say, you stored it in a monitoring table a month ago—and you try to subtract the current LSN from it, you’ll get a negative number or a complete nonsense value because the 64-bit counter has wrapped around to zero.

This is not a design flaw; it’s a physical reality. The takeaway is simple: pg_lsn values and their arithmetic are only meaningful for a relatively short period of time. You can’t store an LSN from six months ago and expect to do useful math with it today. Your monitoring and alerting logic must be designed to handle this. If you’re calculating replication lag, you’re only ever comparing two very recent LSNs.

Best Practices and Clever Tricks

Beyond just monitoring lag, pg_lsn is your key to unlocking powerful features. Want to take a base backup and know exactly which WAL files you need to archive for a successful restore? pg_start_backup() and pg_stop_backup() return LSNs that tell you the exact range of WAL required.

You can also use it for point-in-time recovery (PITR). Your recovery.conf (or postgresql.conf in v12+) uses pg_lsn syntax to tell the server where to stop replaying:

-- This is what you'd put in your recovery target, not in a running DB.
-- recovery_target_lsn = '1/345F82E0'

The most important practice is to treat pg_lsn as the first-class type it is. Store it in your tables as pg_lsn, not as TEXT. This gives you type safety, proper indexing, and the ability to use all the built-in functions without constant casting. It’s a small choice that makes your code more robust and a lot more readable. So go on, use it. It’s one of those Postgres features that feels a bit obscure at first but quickly becomes indispensable.