1.4 Storage Layout: Data Directory, Tablespaces, and Relation Files

Right, let’s pull back the curtain on where PostgreSQL actually lives. Forget the abstractions for a moment; we’re going to talk about files on a disk. This isn’t some proprietary black box—it’s a meticulously organized, if occasionally quirky, file system structure. Knowing your way around this is what separates someone who just uses PostgreSQL from someone who truly operates it. When things go sideways (and they will), this knowledge is your first and best tool.

The heart of everything is the Data Directory (often referred to as PGDATA). You told PostgreSQL where this was when you ran initdb, and it’s the root of all your database’s universe. Everything—your tables, indexes, configuration, transaction logs—starts here. Let’s find it. If you don’t know where yours is, ask the big guy himself:

SHOW data_directory;

       data_directory        
-----------------------------
 /var/lib/postgresql/16/main

Inside this directory, you’ll find a controlled chaos of subdirectories. The base directory is the main event, holding all your databases. global contains cluster-wide tables, like the list of databases themselves. pg_wal (or pg_xlog in older versions) holds the Write-Ahead Log, the absolute non-negotiable ledger of everything that’s happened. Lose this and you’re in a world of pain. pg_log is where your operational logs live (if configured), which is your first stop for debugging why your connection is failing at 2 AM.

The Oddball: Tablespaces

Now, why would you ever want to put database files somewhere other than the PGDATA directory? Three reasons: you’re running out of space on one disk, you want to optimize performance by putting busy indexes on faster storage, or you’re a masochist for complexity.

Enter Tablespaces. A tablespace is essentially an symbolic link that PostgreSQL manages, pointing to any location on the file system the server process can access. You create one like this:

CREATE TABLESPACE fastspace LOCATION '/mnt/ssd_array/postgres_data';

The genius (or madness) is that from within PostgreSQL, you just reference fastspace. The physical location is abstracted away. You can then assign a table or index to live there:

CREATE TABLE important_stuff (id serial, data text) TABLESPACE fastspace;
CREATE INDEX ON important_stuff USING brin (data) TABLESPACE even_faster_space;

The pitfall? If you ever move your database cluster or back it up, you must remember to include these external locations. pg_dump does not back up the physical files in your tablespaces; it just records the DDL command to create them. Your filesystem-level backup strategy just got more complicated. You asked for it.

How Relations Become Files

Let’s get down to the atomic level: how a single table becomes a file. Inside the base directory, there’s a subdirectory for each database, named after its OID (Object ID). You can find your database’s OID with:

SELECT oid, datname FROM pg_database;

  oid  |  datname
-------+------------
 16384 | myapp
     1 | template1
 12345 | postgres

So, the data for database myapp lives in PGDATA/base/16384/. Now, inside that, each table and index (known collectively as “relations”) is stored as a file named after its filenode number. Find a table’s filenode:

SELECT pg_relation_filepath('important_stuff');

 pg_relation_filepath
----------------------
 pg_tblspc/16385/PG_16_202402091/16384/16401

Wait, that looks insane. Let’s break it down:

pg_tblspc/16385/: This means the table is in the tablespace with OID 16385.
PG_16_202402091/: This is a version-specific identifier, ensuring you don’t mix files from different PostgreSQL major versions. A fantastic design choice that saves your bacon during upgrades.
16384/: The database OID.
16401/: The relation’s filenode.

If the table was in the default tablespace, it would just be something like base/16384/16401. But a table isn’t just one file. Once it grows beyond 1GB, PostgreSQL creates a second file called 16401.1, then 16401.2, and so on. This is why you’ll never see a 100GB single file for a massive table; it’s cleverly broken up to avoid filesystem limitations. It’s a simple, robust solution that works everywhere.

The best practice here is to leave these files alone. Don’t rm them, don’t chmod them, don’t try to edit them with vim. Let PostgreSQL manage its own house. Your job is to understand the address it lives at, so when you need to check disk usage, monitor I/O, or configure your storage, you know exactly what you’re looking at. This isn’t just academic knowledge; it’s the foundation of everything that comes next.