19.3 PARTITION BY: Scoping the Window
Alright, let’s talk about PARTITION BY, the secret sauce that elevates window functions from a neat party trick to an indispensable tool in your SQL arsenal. Without it, you’re just doing a calculation over your entire table, which is like using a sledgehammer to crack a nut. PARTITION BY is the scalpel that lets you perform these calculations with surgical precision.
Think of it this way: the OVER() clause defines your window of data. PARTITION BY carves that window into smaller, logical groups, much like GROUP BY does for aggregate functions. But here’s the crucial difference: GROUP BY collapses your rows. It gives you one row per group. PARTITION BY does no such thing. It groups the rows for the calculation but then preserves every single original row in the result set. You get to see the individual tree and the forest it belongs to. This is the superpower.
The Anatomy of a Partition
The syntax is straightforward; you just slot it into the OVER() clause.
SELECT
column1,
column2,
<window_function>(column3) OVER (
PARTITION BY partition_column1, partition_column2
ORDER BY order_column
) AS calculated_value
FROM your_table;
You can partition by one column, or by multiple columns to create increasingly specific subgroups. The order of columns in the PARTITION BY list matters about as much as the order of your socks in a drawer—it doesn’t affect the final result, just how the database might internally sort to achieve it.
A Concrete Example: Departmental Bragging Rights
Let’s say we have a table of employees. We want to rank employees within each department by their salary. Without PARTITION BY, RANK() would just rank everyone in the company against each other, which is useless for seeing who the big earners are in the Marketing vs. Engineering factions.
SELECT
department,
employee_name,
salary,
RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS dept_salary_rank
FROM employees;
Sample Output:
| department | employee_name | salary | dept_salary_rank |
|---|---|---|---|
| Engineering | Alice | 150000 | 1 |
| Engineering | Bob | 145000 | 2 |
| Engineering | Charlie | 145000 | 2 |
| Engineering | David | 130000 | 4 |
| Marketing | Eve | 120000 | 1 |
| Marketing | Frank | 95000 | 2 |
See what happened? The window function (RANK()) reset itself for each department. It calculated the ranks for the Engineering group, finished, and then started completely fresh when it moved on to the Marketing group. This is PARTITION BY in action.
What Happens When You Leave It Out?
This is a classic “I meant to do that” moment for many beginners. Omitting PARTITION BY is perfectly valid syntax. It just means “create a window over the entire result set.” This is useful for things like running totals or overall rankings.
-- How does each employee's salary compare to the company average?
SELECT
department,
employee_name,
salary,
AVG(salary) OVER () AS company_avg_salary -- No PARTITION BY = one big partition
FROM employees;
The AVG(salary) here is calculated across every row fetched by the FROM and WHERE clauses. Every single row in the result will have the same value in the company_avg_salary column. It’s a global average.
The Subtle Pitfall: NULLs Group Together
Here’s a fun “gotcha” that feels obvious only after you’ve been burned by it. PARTITION BY treats all NULL values as equals. They all get grouped into the same partition.
Imagine partitioning by a manager_id column to analyze teams. If 10 employees have no manager (manager_id IS NULL), they will all be lumped together into one, sad, manager-less partition. This is almost always the behavior you want (they are a logical group: “the unmanaged”), but it’s vital to be aware of it so you don’t misinterpret your results. Don’t let a bunch of NULLs silently create a massive, meaningless group in your analysis.
Best Practice: Partitioning and Performance
Let’s be direct: window functions can be expensive, and PARTITION BY is no exception. The database has to sort the data within each partition to perform its calculations. Here’s how to not make your DBA hate you:
- Index Smartly: An index on
(partition_column, order_column)is like a gift to the query planner. It can often avoid a massive sort operation by reading the data in the exact order it needs for the window function. If you’re constantly partitioning sales data byregionand ordering bysale_date, an index on(region, sale_date)will make that query sing. - Filter Before the Window: This is critical. Your
WHEREclause filters rows before the window function is applied. YourPARTITION BYoperates on the result of theWHEREclause. If you want to filter after the window calculation (e.g., “only show the top 3 ranked employees per department”), you need to wrap the query in a CTE or a subquery and then filter on the calculated column. You can’t reference window functions in aWHEREclause—it’s a logical contradiction the database will refuse to execute.
So, use PARTITION BY with gusto. It’s the key to asking complex, layered questions of your data without breaking a sweat. Just remember you’re the one defining the groups; the database is just the obedient, if sometimes literal-minded, genius doing the math for you.