1.2 The Postmaster and Backend Processes: How Connections Are Served

Right, let’s pull back the curtain on how PostgreSQL actually handles you knocking on its door. This isn’t some monolithic application that does everything itself. Oh no, that would be too simple, and frankly, a single point of failure. Instead, it uses a brilliant, time-tested model of delegation: a benevolent manager (the Postmaster) and a legion of specialized workers (backend processes). Understanding this isn’t academic; it’s the key to diagnosing performance issues, connection problems, and understanding what the hell pg_stat_activity is actually showing you.

When you start PostgreSQL (via pg_ctl start or however your OS does it), the very first process that springs to life is the Postmaster. Think of it as the receptionist, security guard, and HR manager all rolled into one. Its job is not to do any actual query processing. Its job is to manage the estate. It boots up, allocates the shared memory—a critical chunk of RAM that every backend process will need to access—and then opens its front door: the network port, usually 5432. Then it just sits there, listening, waiting for a connection request.

The Handshake: From Connection String to Backend Process

So, you fire up psql or your application sends a connection string. A network packet shoots off to port 5432 on the server. The Postmaster hears the knock.

“Ah, a new client!” it says. But the Postmaster has a strict rule: it never does the client’s work itself. That would be a terrible bottleneck. Instead, it performs a rapid fork(). On Unix-like systems, this is its primary move. It creates a brand new, nearly identical copy of itself. This new process is your dedicated backend process. On Windows, it uses threads for efficiency, but the conceptual model is identical: a dedicated entity is created for you. This is why you’ll see many postgres processes in your system’s process list. The original Postmaster is the parent of all of them.

This new backend process is now your personal servant. It inherits the connection socket from the Postmaster and takes over all communication with your client. The Postmaster, blissfully free of your demands, goes right back to listening for the next connection request. This model is fantastic for stability; if your backend process crashes spectacularly because of a buggy C function you wrote, it generally won’t affect the Postmaster or any other existing connection. The Postmaster will just log the crash and continue serving others.

You can see this in action. Fire up a psql session and find its process ID.

SELECT pg_backend_pid();

 pg_backend_pid 
----------------
          12345
(1 row)

Now, hop onto your server and look at that process. It’s a full-fledged OS process.

ps aux | grep 12345
postgres  12345  0.0  0.5 302104 11000 ?        Ss   14:30   0:00 postgres: mydb myuser [local] SELECT

The Shared Memory Heart

Every one of these backend processes is its own separate entity in the OS, which is great for isolation. But they absolutely need to talk to each other and, more importantly, have a single source of truth about the state of the database. This is where Shared Memory and the Buffer Cache come in.

The Postmaster allocates this shared memory at startup. It’s a central pool everyone dips into. The most important part is the Buffer Cache. When your backend process needs to read a table or index block from disk, it doesn’t just read it into its own private memory. It reads it into a slot in the shared buffer cache. The genius here is that if another backend process needs the exact same block a millisecond later, it can find it already in memory, saving a painfully slow disk read. This is the secret to performance. They coordinate access to this shared data using lightweight locks and latches to prevent corruption.

This is also why abruptly killing the server (with kill -9 on the Postmaster) is a terrible idea. The shared memory is a consensus of state between processes. Yanking the power cord means that consensus is lost, and PostgreSQL has to replay the Write-Ahead Log (WAL) on the next startup to guarantee your data is consistent. Always use pg_ctl stop -m fast.

Tuning for the Real World: `max_connections` and the Forking Tax

This forking model is robust, but it’s not free. Each backend process is a full OS process, each consuming its own chunk of memory (particularly for sorting, joins, etc., in the work_mem space). This is why the max_connections setting is so critical and why the default of 100 is often too high for production.

Let’s say you naively set max_connections = 1000. The Postmaster will happily try to comply. But when you get 500 simultaneous connections, you now have 500 processes. The OS overhead of context switching between them becomes a nightmare. More importantly, if you have a work_mem setting of 4MB, you’ve just given PostgreSQL the potential to use 500 * 4MB = 2GB of memory per sort operation if every connection does a big sort at once. Your server will enthusiastically swap itself to death.

This is the prime reason connection pooling is non-negotiable for any serious web application. A pooler like PgBouncer acts as a “connection concentrator,” holding a small, efficient set of backend processes open to the database and multiplexing thousands of short-lived client connections over them. It saves the day by making the forking model scale.

The design is a masterpiece of Unix philosophy: do one thing well, and compose processes to build complex systems. It’s not the absolute fastest model for every scenario, but its stability and clarity are why PostgreSQL can run for years without a restart. Respect the Postmaster. It’s the quiet genius keeping the whole show running.

The Handshake: From Connection String to Backend Process

The Shared Memory Heart

Tuning for the Real World: max_connections and the Forking Tax

Tuning for the Real World: `max_connections` and the Forking Tax