17.6 Derived Tables: Subqueries in the FROM Clause
Alright, let’s talk about putting a subquery in the one place you probably didn’t expect it: the FROM clause. It feels a bit wrong, like putting a sofa in the kitchen. But sometimes, the kitchen needs a sofa. This is what we call a derived table, and it’s one of the most powerful tools for wrangling complex logic without losing your mind.
Think of it this way: a regular SELECT works on a table. A derived table lets your SELECT work on the result of another SELECT. You’re effectively creating a temporary, in-memory table on the fly, just for the duration of your main query. It’s a fantastic way to break a monstrous, multi-part problem into a sane, two-step process: 1) build a smaller, focused result set, and 2) query against that.
The Basic Anatomy of a Derived Table
The syntax is straightforward, but the first time you see it, it might cause a double-take. You just wrap your subquery in parentheses and give it an alias. The database engine executes that inner query first, materializes the result, and then your outer query uses it like any other table.
SELECT
dept_name,
avg_salary
FROM (
SELECT
department_id,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) AS department_averages -- This alias is MANDATORY. Don't forget it.
WHERE avg_salary > 75000;
In this example, the inner query does the heavy lifting of grouping and averaging. The outer query then simply filters on that result. This is much cleaner than trying to shove a HAVING AVG(salary) > 75000 clause into the inner query, especially if your filtering logic gets more complex.
Why Bother? The Superpowers of Derived Tables
You might be wondering why you wouldn’t just use a VIEW. The beauty of a derived table is that it’s ephemeral. It exists for this one query and then vanishes. There’s no database clutter. You’re essentially creating a custom, purpose-built dataset for a single operation.
Their real power emerges in a few key scenarios:
- Pre-aggregation: As shown above, you can perform a complex aggregation first and then simply filter or join the results. This is often more performant and readable than repeating the aggregate function in a
HAVINGclause or aWHEREon a joined table. - Simplifying Complex Joins: Need to join to a result set that itself requires a join or complex filtering? A derived table lets you hide that complexity.
- Bypassing Limitations: This is a big one. You can’t just reference a column alias from the
SELECTlist in aWHEREclause because of the logical order of operations. TheWHEREclause is evaluated before theSELECT. A derived table effortlessly solves this.
-- This will FAIL because the WHERE doesn't know about 'name_length'
SELECT
first_name,
LEN(first_name) AS name_length
FROM employees
WHERE name_length > 5; -- Error! Invalid column name 'name_length'
-- This works PERFECTLY using a derived table
SELECT *
FROM (
SELECT
first_name,
LEN(first_name) AS name_length
FROM employees
) AS calculated_lengths
WHERE name_length > 5;
The derived table forces the calculation of name_length to happen first, creating a new “table” that includes that column. The outer query can then filter on it freely.
The Gotchas: Where the Shine Wears Off
They aren’t magic. The database has to materialize that result set, which can be a performance sink if the inner query returns a million rows. It’s a classic trade-off: readability and logical simplicity for potential cost in execution time. Always check the query plan on large datasets.
Also, note that all columns in the derived table must have names. If your inner query has a calculation without an alias, you’re in for a world of hurt. The outer query needs to be able to reference something.
-- BAD: The outer query has no name for the column!
SELECT id, calculated_value
FROM (
SELECT id, price * quantity -- This column has no name!
FROM orders
) AS derived_orders;
-- GOOD: Always alias your calculated columns.
SELECT id, total
FROM (
SELECT id, price * quantity AS total
FROM orders
) AS derived_orders;
Taking it Further: Joining Derived Tables
This is where you level up. You can join multiple derived tables together, creating a pipeline of data transformation. It’s like building a complex analysis out of simple, logical building blocks.
SELECT
e.first_name,
e.last_name,
da.avg_salary AS department_average,
e.salary - da.avg_salary AS difference_from_avg
FROM employees e
INNER JOIN (
SELECT
department_id,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) AS da ON e.department_id = da.department_id
ORDER BY difference_from_avg DESC;
Here, we create a derived table of department averages and then join it back to the main employees table to see how each individual compares to their department’s average. It’s clear, it’s modular, and it saves you from a far more convoluted single-level query.
So, the next time you find yourself building a Rube Goldberg machine of a query, stop. See if you can break it apart. Create a temporary table right in the FROM clause. Your future self, trying to read that code at 2 AM during an outage, will thank you for it.