32.1 Declarative Partitioning: PARTITION BY Range, List, Hash
Alright, let’s get our hands dirty with declarative partitioning. This is where PostgreSQL stopped making us write triggers and rules to manually shove data into child tables and finally gave us a proper, first-class way to declare our partitioning logic right in the CREATE TABLE statement. It’s a massive quality-of-life improvement, and you should never use the old inheritance-based method for new projects unless you enjoy pain.
The core of this is the PARTITION BY clause. You’re telling the database, “Hey, here’s the master plan for how I want this giant table split up.” You then create child tables that are partitions of this parent, each one holding a specific subset of the data based on the method you chose. The parent table becomes an empty shell—a logical concept—and the actual data lives in the partitions. It’s like a filing cabinet (the parent table) with labeled folders inside (the partitions); you never just shove a document into the cabinet itself.
The Three Flavors of PARTITION BY
You’ve got three main choices here, and picking the right one is crucial. Get it wrong, and you’ll have a beautifully partitioned table that performs like a stuck pig.
Range Partitioning is your go-to for anything temporal or sequential. Think dates, timestamps, or maybe even numeric IDs that just keep going up. You define partitions for specific, continuous ranges like FROM ('2023-01-01') TO ('2023-02-01'). A row goes into the partition where its partition key value falls within that range. It’s intuitive and perfect for time-series data where you’re often querying by a time range and want to quickly exclude entire chunks of data (a concept known as partition pruning).
List Partitioning is for when you have discrete, categorical values. Imagine a country column or a region code. You explicitly define which values belong in which partition. A row goes into the partition that lists its key value. It’s fantastic for when your data is naturally grouped into buckets that don’t form a continuous spectrum.
Hash Partitioning is the wildcard. You give it a column, and PostgreSQL hashes the value in that column to decide which partition it belongs to. The goal here isn’t logical grouping for querying, but rather to distribute the data evenly across a fixed number of partitions. You’d use this when you have no obvious range or list key and you just want to break a huge table into smaller, more manageable pieces for parallel operations. The distribution is random-ish, so your queries will almost always have to hit multiple partitions.
Code Examples: From Theory to Practice
Let’s build a table for sensor readings. We’re logging a timestamp and a value. This screams “range partitioning.”
-- First, create the parent table. Notice it's empty; it's just the blueprint.
CREATE TABLE sensor_data (
sensor_id int,
collected_at timestamptz NOT NULL,
reading decimal NOT NULL
) PARTITION BY RANGE (collected_at);
Now, let’s create partitions for each month. This is where the actual data will live.
-- Partition for January 2023
CREATE TABLE sensor_data_2023_01
PARTITION OF sensor_data
FOR VALUES FROM ('2023-01-01') TO ('2023-02-01');
-- Partition for February 2023
CREATE TABLE sensor_data_2023_02
PARTITION OF sensor_data
FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');
See how the TO value is exclusive? A timestamp of ‘2023-02-01 00:00:00’ goes into the February partition, not the January one. This is a classic “gotcha” moment. You must ensure your ranges are continuous and non-overlapping, or PostgreSQL will quite rightly throw an error. Gaps are allowed, but if you insert a row that doesn’t fit into any existing partition, it will be rejected. Which brings us to…
The Default Partition: A Safety Net (Use Sparingly)
What happens if you insert a row with a timestamp of ‘2024-01-01’ and you haven’t created a 2024 partition yet? By default, it’s an error. You can create a DEFAULT partition to catch these outliers.
CREATE TABLE sensor_data_default
PARTITION OF sensor_data
DEFAULT;
This is useful as a temporary staging area, but be warned: it becomes a performance nightmare if it gets large. The planner has to check this partition for every query, even if it’s obvious the data can’t be there. It’s like a junk drawer; okay for bits and bobs, but if you start throwing everything in there, finding anything becomes a nightmare. Your goal should be to never need it. Manage your partitions proactively with a script.
Indexes and Keys: The Fine Print
Here’s a critical detail that’s easy to miss: You must create indexes on each partition individually. Defining a primary key or index on the parent table is a nice declaration of intent, but it doesn’t automatically propagate down to the children. This is one of those “questionable choices” I mentioned—it’s a bit of a footgun.
-- This creates a key on the parent, but NOT on the existing partitions.
ALTER TABLE sensor_data ADD PRIMARY KEY (sensor_id, collected_at);
-- You must do this for each partition you've already created.
CREATE INDEX ON sensor_data_2023_01 (sourced_at);
CREATE INDEX ON sensor_data_2023_02 (sourced_at);
-- For new partitions, you can define indexes at creation time.
CREATE TABLE sensor_data_2023_03 PARTITION OF sensor_data (
PRIMARY KEY (sensor_id, collected_at) -- Define PK here for this partition
) FOR VALUES FROM ('2023-03-01') TO ('2023-04-01');
Also, note that a unique constraint or primary key must include all columns of the partition key. Why? Because uniqueness can only be enforced per partition. The database can’t guarantee a unique sensor_id across all partitions unless it checks them all, which would defeat the purpose. So including the partition key (collected_at) in the PK ensures the uniqueness is enforceable partition-local.
Why This Rocks and When It Doesn’t
When it works, it’s magic. A query for WHERE collected_at BETWEEN '2023-01-15' AND '2023-01-20' will only scan the sensor_data_2023_01 partition. The planner prunes away all others. This is the killer feature.
But it’s not a silver bullet. Partitioning isn’t a substitute for a good index. If you’re constantly querying WHERE sensor_id = 123 with no date filter, the planner will have to hit every single partition (this is called partition-wise append), which is likely slower than just querying one big table with an index on sensor_id. You partitioned for a reason—make sure your query patterns align with that reason.
The administrative overhead is real. You need a maintenance script (e.g., a cron job) to create new partitions for the future and to detach/archive old ones. If you don’t, your application will start throwing errors when it tries to insert data into a time range that doesn’t exist yet. PostgreSQL manages the data for you, but it won’t manage the partition creation—that’s on you. So automate it early, or you’ll be getting paged at 3 a.m. when your DEFAULT partition fills up.