19.1 Window Functions vs GROUP BY: What Makes Them Different
Alright, let’s cut through the noise. You already know GROUP BY. It’s the SQL hammer: it smashes your rows into neat little summary cubes. One row per group. You get your MAX, your SUM, your COUNT—all very useful, but also very final. It’s the end of the line for your individual data points.
Window functions are a different beast entirely. They’re not about crushing rows; they’re about giving every single row a new perspective, a new set of glasses to see the rest of the table. They perform calculations across rows while letting each row keep its identity. It’s the difference between asking “What’s the average salary for each department?” (GROUP BY) and “For each employee, how does their salary compare to their department’s average?” (a window function).
The key distinction is this: GROUP BY aggregates and collapses. Window functions calculate and preserve.
The PARTITION BY: Your New GROUP BY (That Doesn’t Collapse Anything)
Think of PARTITION BY as a GROUP BY that’s allergic to commitment. It defines the window—the group of rows—your function will look at, but it refuses to collapse the result into a single row. Every row in the original partition gets the same aggregated value, which is just mind-bendingly useful.
Let’s say we have a table of sales. GROUP BY tells us the total sales per rep.
-- The old way: collapse and lose detail
SELECT sales_rep_id, SUM(amount) as total_sales
FROM sales
GROUP BY sales_rep_id;
Result: One row per sales rep. You see the total, but you lose all information about individual transactions.
Now, watch the window function magic.
-- The new way: calculate and keep everything
SELECT
sales_rep_id,
sale_date,
amount,
SUM(amount) OVER (PARTITION BY sales_rep_id) as total_sales_for_rep
FROM sales;
Result: Every single sale row is preserved. But on each row, the total_sales_for_rep column contains the sum of all sales for that row’s rep. It’s like each row gets a personal billboard showing the team’s total score.
Ranking and Order: Where GROUP BY Completely Fails
This is where GROUP BY just throws its hands up and walks away. How do you find the top 3 sales per rep? Or rank sales reps by their best single sale? You simply cannot do this with GROUP BY alone without some truly horrific subquery nonsense.
Window functions laugh in the face of this problem.
SELECT
sales_rep_id,
sale_date,
amount,
RANK() OVER (
PARTITION BY sales_rep_id
ORDER BY amount DESC
) as rank_within_rep
FROM sales;
Here, PARTITION BY sales_rep_id says “make a separate list for each rep.” ORDER BY amount DESC says “now sort that list from highest to lowest sale.” RANK() then assigns a rank number to each row within its own little partitioned list. The highest sale for a rep gets rank 1. This is incredibly powerful for finding top-N records per group, something that is otherwise a SQL nightmare.
The LAG and LEAD Party Trick
GROUP BY can only look at the group as a whole, a single blob of data. Window functions, specifically LAG and LEAD, can peer directly at a row’s neighbors. This is how you calculate running totals, day-over-day growth, or find the previous event in a sequence.
Trying to get the previous sale amount for each sales rep? Don’t even think about a self-join. It’s ugly.
SELECT
sales_rep_id,
sale_date,
amount,
LAG(amount) OVER (
PARTITION BY sales_rep_id
ORDER BY sale_date
) as previous_sale_amount
FROM sales
ORDER BY sales_rep_id, sale_date;
For each row, LAG(amount) looks back one row in the window (which, thanks to PARTITION and ORDER BY, is the rep’s sales in chronological order) and grabs the value. The first sale for a rep will have NULL for previous_sale_amount, which is exactly what you’d expect. It’s elegant, readable, and performant.
The Biggest Pitfall: Forgetting the ORDER BY
This is a classic “oh, you sweet summer child” error. The OVER() clause is flexible, but that flexibility is a trap. Look at this:
SUM(amount) OVER (PARTITION BY sales_rep_id) -- Correct, sums the entire partition
SUM(amount) OVER (PARTITION BY sales_rep_id ORDER BY sale_date) -- Whoa, different thing!
That second one? Because you added an ORDER BY, you’ve changed the window. It’s no longer the entire partition by default. It’s now a running total from the start of the partition up to the current row. This is often what people want, but sometimes they add an ORDER BY for ranking and accidentally turn their sum into a running total, getting very confused by the results. Always be intentional with ORDER BY—it changes the meaning of the window frame.
So, to wrap it up: Use GROUP BY when you need a summary report. Use window functions when you need to add context, rankings, or comparisons to every individual row without losing any of the glorious detail. They don’t replace GROUP BY; they complement it, and together they make you terrifyingly powerful.