11.1 REFERENCES: Declaring a Foreign Key
Alright, let’s get our hands dirty with the REFERENCES clause. This is where you stop just having tables and start having a database—a system of interconnected, relational data. It’s the single most important line of code for enforcing what we call “referential integrity.” Fancy term, simple meaning: it guarantees that the links between your tables make sense. No orphaned records, no phantom IDs, no nonsense.
Think of it like this: you have a users table and an orders table. It would be utterly absurd for an order to be attributed to a user_id that doesn’t exist. That order would be a digital ghost, haunting your reports and causing errors. The foreign key constraint is the supernatural bounty hunter that prevents this. It says, “An order must reference a valid, existing user. No exceptions.”
Here’s the basic syntax. You can declare it inline with the column if it’s a single-column key, which is often the case.
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
order_date TIMESTAMP NOT NULL DEFAULT NOW(),
user_id INTEGER NOT NULL REFERENCES users(user_id)
);
That simple REFERENCES users(user_id) on the user_id column is doing all the heavy lifting. I’m telling PostgreSQL: “Hey, every value you ever see in this user_id column must match a value in the user_id column of the users table.” PostgreSQL, being the magnificently obedient (if sometimes pedantic) system that it is, will now enforce this rule with extreme prejudice.
The Two Ways to Define Them
You just saw the inline method. The other way, which is mandatory for composite foreign keys (keys made up of multiple columns) and often clearer for single ones, is the table-constraint style. It comes after you’ve defined all your columns.
CREATE TABLE order_items (
order_item_id SERIAL PRIMARY KEY,
order_id INTEGER NOT NULL,
product_id INTEGER NOT NULL,
quantity INTEGER NOT NULL CHECK (quantity > 0),
-- This is a composite foreign key. It ensures this specific
-- combination of order_id and product_id exists in the parent table.
CONSTRAINT fk_order_product
FOREIGN KEY (order_id, product_id)
REFERENCES orders(order_id, product_id)
);
Why use this second way? Clarity. It names the constraint (fk_order_product), which is a gift you will give your future self when you need to alter or drop it. It also keeps your column definitions a bit less cluttered. My rule of thumb: use inline for simple, single-column keys, and the table constraint for everything else.
What Happens on Delete? This is Crucial.
Here’s where the designers gave us options, and where most people mess up. By default, if you try to delete a user who has orders, PostgreSQL will stop you. Loudly. This is called RESTRICT, and it’s the sane default. But sometimes you want the parent row’s deletion to cascade.
This is the part you absolutely must think about before you create the table. Your choices are:
ON DELETE RESTRICT(The Default): “Nope, can’t delete that user. Go delete their orders first.” This is the safe, “yell at the developer” option.ON DELETE CASCADE: “Delete the user? Okay, and delete all their orders, and all their order items, and everything else that ever pointed to them.” This is the “take the whole building down with them” option. Powerful, but dangerous. Use it with extreme caution on user data.ON DELETE SET NULL: “Delete the user? Okay, I’ll set theuser_idon all their orders toNULL.” This turns those orders into orphans. They exist, but belong to no one. This is often worse thanRESTRICT.ON DELETE SET DEFAULT: Similar toSET NULL, but tries to set the column to itsDEFAULTvalue. This is rarely useful unless you’ve set a thoughtful default, which almost no one does.
The best practice? Stick with RESTRICT unless you have a damn good reason not to. Your data’s integrity is more important than convenience. You define it like so:
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(user_id) ON DELETE RESTRICT -- explicit is better than implicit
);
Or, for the table-constraint style:
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL,
CONSTRAINT fk_orders_user
FOREIGN KEY (user_id)
REFERENCES users(user_id)
ON DELETE CASCADE -- I said careful with this!
);
The Hidden Cost: Locks and Indexes
Nothing’s free. Enforcing this relationship requires work. When you insert a new order, PostgreSQL must check the users table to confirm the user_id exists. This acquires a lightweight lock on the referenced row in users to prevent someone from deleting it while your transaction is in flight. It’s fast, but it’s a thing.
More importantly, to make this check blazingly fast, the referenced column in the parent table (users.user_id in our example) must be indexed. If it’s a primary key, which it almost always is, you’re golden—PKs are automatically indexed. But if you’re referencing a column that isn’t a primary key or unique? First of all, why? That’s a bit weird. But secondly, you must create an index on that column yourself. The database will not do it for you, and your foreign key checks will be painfully, tragically slow. This is a common “it worked in dev but exploded in prod” pitfall. Don’t reference unindexed columns. Just don’t.