13.2 Joins in UPDATE: Updating Rows Based on Another Table
Right, so you’ve mastered UPDATE for a single table. You feel powerful. Good. Now let’s graduate to the real world, where data is messy, relationships are complicated, and you need to update one table based on the contents of another. This is where UPDATE...FROM joins come in, and it’s also where you can accidentally nuke an entire table if you’re not careful. Don’t worry, I’ll show you how to avoid that.
The core idea is simple: you’re UPDATEing one table (let’s call it the target), but you’re using columns from another table (the source) in your WHERE clause to figure out which rows to update. The syntax isn’t the standard ANSI SQL you might be used to from SELECT statements; it’s a PostgreSQL-ism that is brilliantly clear once you get it.
The Basic Syntax: It’s Backwards Until It Isn’t
Here’s the canonical example. Imagine your company finally gives in and standardizes all price increases. You need to bump the price of every product by 10%, but only if that product is in the “Electronics” category, and you need to get that category info from the categories table.
UPDATE products
SET price = price * 1.10
FROM categories
WHERE products.category_id = categories.id
AND categories.name = 'Electronics';
Let’s break down why this works. After the UPDATE products and SET clauses, the FROM categories introduces the second table. This creates an implicit inner join between products and categories. The magic happens in the WHERE clause: it both defines the join condition (products.category_id = categories.id) and the filter for the update (categories.name = 'Electronics').
The mental model is this: First, PostgreSQL conceptually creates a derived table from the join between products and categories. Then, it updates every row in the target table (products) that has a match in this derived table. It’s crucial to understand that you’re always updating every matching row in the table you specified after UPDATE.
The Cardinal Sin: Forgetting the Join Condition
This is the part that will get you. Look at this absolute disaster waiting to happen:
-- DANGER! DO NOT RUN!
UPDATE products
SET price = 0.99
FROM categories
WHERE categories.name = 'Clearance';
You think you’re setting all clearance items to $0.99. What you’ve actually told PostgreSQL is: “For every row in the categories table where the name is ‘Clearance’, update every single row in the products table and set its price to $0.99.”
You just made everything in your store cost $0.99. You’ve created a very chaotic dollar store. Congratulations. The FROM clause without a proper join condition in the WHERE clause creates a Cartesian product between the filtered categories rows and the entire products table. Always, always, always include a condition that links the two tables.
Updating with a Subquery? Usually Overkill.
You might see people do this with a subquery instead:
UPDATE products
SET price = price * 1.10
WHERE category_id IN (
SELECT id FROM categories WHERE name = 'Electronics'
);
This is perfectly valid ANSI SQL and is often more portable. But in PostgreSQL, the UPDATE...FROM with a join is frequently more readable and can be significantly faster, especially with large datasets, because the planner can optimize the join better than it can optimize a WHERE...IN clause with a subquery. Use the join version. It’s the idiomatic way.
What About Outer Joins?
Sometimes you don’t just want to update rows that have a match; you want to update rows that don’t have a match. For that, you need a left join. The syntax, frankly, gets a bit weird. Let’s say you want to clear the price of any product that doesn’t have a valid category anymore (a classic cleanup task for orphaned records).
UPDATE products
SET price = NULL
FROM categories
WHERE products.category_id = categories.id (+);
Wait, no, that’s Oracle syntax. My bad. In PostgreSQL, you do this by moving the join logic into the FROM clause itself.
UPDATE products
SET price = NULL
FROM (SELECT id FROM categories) AS cat
WHERE products.category_id = cat.id (+);
Just kidding again. PostgreSQL doesn’t use the (+) operator either. The real way is to use a left join in a subquery, but honestly, for an UPDATE based on the absence of a relationship, I almost always find it clearer and safer to use a WHERE NOT EXISTS subquery. It’s harder to mess up.
-- The safe and clear way to update orphans
UPDATE products
SET price = NULL
WHERE NOT EXISTS (
SELECT 1 FROM categories
WHERE categories.id = products.category_id
);
See? No weird join syntax to remember. The rule of thumb: use UPDATE...FROM for updating based on a positive match (an inner join), and use a subquery with WHERE EXISTS or WHERE NOT EXISTS for checking existence or absence.
Using RETURNING to See What You Wrought
You’re not a savage, so you’re testing your UPDATE statements in a transaction with BEGIN and ROLLBACK first, right? Good. But even then, how do you know what actually changed? You use RETURNING, that’s how. It’s your best friend.
UPDATE products
SET price = price * 1.10
FROM categories
WHERE products.category_id = categories.id
AND categories.name = 'Electronics'
RETURNING products.id, products.name, products.price;
This will output a result set of every row that was updated, showing you the new price. It’s the perfect way to do a quick visual confirmation that your query didn’t do anything horrifying. It turns a destructive operation into a informative one. Use it relentlessly.