19.3 PARTITION BY: Scoping the Window

Alright, let’s talk about PARTITION BY, the secret sauce that elevates window functions from a neat party trick to an indispensable tool in your SQL arsenal. Without it, you’re just doing a calculation over your entire table, which is like using a sledgehammer to crack a nut. PARTITION BY is the scalpel that lets you perform these calculations with surgical precision.

Think of it this way: the OVER() clause defines your window of data. PARTITION BY carves that window into smaller, logical groups, much like GROUP BY does for aggregate functions. But here’s the crucial difference: GROUP BY collapses your rows. It gives you one row per group. PARTITION BY does no such thing. It groups the rows for the calculation but then preserves every single original row in the result set. You get to see the individual tree and the forest it belongs to. This is the superpower.

The Anatomy of a Partition

The syntax is straightforward; you just slot it into the OVER() clause.

SELECT
  column1,
  column2,
  <window_function>(column3) OVER (
    PARTITION BY partition_column1, partition_column2
    ORDER BY order_column
  ) AS calculated_value
FROM your_table;

You can partition by one column, or by multiple columns to create increasingly specific subgroups. The order of columns in the PARTITION BY list matters about as much as the order of your socks in a drawer—it doesn’t affect the final result, just how the database might internally sort to achieve it.

A Concrete Example: Departmental Bragging Rights

Let’s say we have a table of employees. We want to rank employees within each department by their salary. Without PARTITION BY, RANK() would just rank everyone in the company against each other, which is useless for seeing who the big earners are in the Marketing vs. Engineering factions.

SELECT
  department,
  employee_name,
  salary,
  RANK() OVER (
    PARTITION BY department
    ORDER BY salary DESC
  ) AS dept_salary_rank
FROM employees;

Sample Output:

department	employee_name	salary	dept_salary_rank
Engineering	Alice	150000	1
Engineering	Bob	145000	2
Engineering	Charlie	145000	2
Engineering	David	130000	4
Marketing	Eve	120000	1
Marketing	Frank	95000	2

See what happened? The window function (RANK()) reset itself for each department. It calculated the ranks for the Engineering group, finished, and then started completely fresh when it moved on to the Marketing group. This is PARTITION BY in action.

What Happens When You Leave It Out?

This is a classic “I meant to do that” moment for many beginners. Omitting PARTITION BY is perfectly valid syntax. It just means “create a window over the entire result set.” This is useful for things like running totals or overall rankings.

-- How does each employee's salary compare to the company average?
SELECT
  department,
  employee_name,
  salary,
  AVG(salary) OVER () AS company_avg_salary -- No PARTITION BY = one big partition
FROM employees;

The AVG(salary) here is calculated across every row fetched by the FROM and WHERE clauses. Every single row in the result will have the same value in the company_avg_salary column. It’s a global average.

The Subtle Pitfall: NULLs Group Together

Here’s a fun “gotcha” that feels obvious only after you’ve been burned by it. PARTITION BY treats all NULL values as equals. They all get grouped into the same partition.

Imagine partitioning by a manager_id column to analyze teams. If 10 employees have no manager (manager_id IS NULL), they will all be lumped together into one, sad, manager-less partition. This is almost always the behavior you want (they are a logical group: “the unmanaged”), but it’s vital to be aware of it so you don’t misinterpret your results. Don’t let a bunch of NULLs silently create a massive, meaningless group in your analysis.

Best Practice: Partitioning and Performance

Let’s be direct: window functions can be expensive, and PARTITION BY is no exception. The database has to sort the data within each partition to perform its calculations. Here’s how to not make your DBA hate you:

Index Smartly: An index on (partition_column, order_column) is like a gift to the query planner. It can often avoid a massive sort operation by reading the data in the exact order it needs for the window function. If you’re constantly partitioning sales data by region and ordering by sale_date, an index on (region, sale_date) will make that query sing.
Filter Before the Window: This is critical. Your WHERE clause filters rows before the window function is applied. Your PARTITION BY operates on the result of the WHERE clause. If you want to filter after the window calculation (e.g., “only show the top 3 ranked employees per department”), you need to wrap the query in a CTE or a subquery and then filter on the calculated column. You can’t reference window functions in a WHERE clause—it’s a logical contradiction the database will refuse to execute.

So, use PARTITION BY with gusto. It’s the key to asking complex, layered questions of your data without breaking a sweat. Just remember you’re the one defining the groups; the database is just the obedient, if sometimes literal-minded, genius doing the math for you.