16.3 HAVING: Filtering After Aggregation
Right, so you’ve grouped your data. You’ve got your neat little summary rows, your averages, your counts. It’s a beautiful, aggregated masterpiece. But what if you don’t want all of those groups? What if you only care about the groups where the average is above a certain threshold, or the count is suspiciously high? You can’t slap a WHERE clause on this; WHERE is for filtering raw rows before they get fed into the aggregation machine. For filtering the results of that machine, you need its bouncer: the HAVING clause.
Think of it this way: WHERE is the guest list for the party. GROUP BY is me putting all the guests into cliques (the “jocks,” the “nerds,” the “people who unironically use the word ‘synergy’”). HAVING is me then saying, “Okay, but I only want to see the cliques that have more than five people in them.” You’re filtering the groups themselves.
The Syntax is Your Friend (Mostly)
The HAVING clause comes after GROUP BY and before ORDER BY. It feels just like a WHERE clause, but it gets to use the aggregated values you just created.
SELECT department_id, AVG(salary) as avg_salary, COUNT(*) as headcount
FROM employees
GROUP BY department_id
HAVING AVG(salary) > 75000;
This query answers a very important question: “Which departments have an average salary greater than $75,000?” Notice how we’re using the aggregate function AVG(salary) directly in the HAVING clause. You can also use the column alias you defined in the SELECT list, but here’s a quirky bit of SQL lore: it’s not universally supported. While most modern databases (like PostgreSQL) will happily let you use HAVING avg_salary > 75000, some older ones (looking at you, MySQL in certain modes) will throw a fit. The safest, most portable bet is to just repeat the aggregate expression. It looks uglier, but it never lies.
Why You Can’t Use WHERE for This
Let’s be absolutely clear why this can’t be done with WHERE. If you try this, you’ll see the error of your ways:
SELECT department_id, AVG(salary) as avg_salary
FROM employees
WHERE AVG(salary) > 75000 -- NOPE! This will cause an error.
GROUP BY department_id;
The reason is the order of operations. SQL logically processes the FROM and WHERE clauses first, pulling individual rows from the table. The AVG() function doesn’t exist yet at this stage because there’s nothing to average—it needs the groups to be formed first. The database will stop you and say, “Hey, idiot, I haven’t grouped anything yet, how can I give you an average?” It’s a bit more polite than that, but the sentiment is the same.
HAVING with Multiple Conditions
You can get as fancy as you want with HAVING, using AND, OR, and NOT just like in a WHERE clause. Let’s say you’re looking for departments that are both large and well-paid, but you want to exclude a specific department ID.
SELECT department_id,
COUNT(*) AS employee_count,
AVG(salary) AS average_salary
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 10
AND AVG(salary) > 70000
AND department_id != 5; -- Exclude department 5 (probably HR)
The FILTER Clause: A Sharper Knife
Now for the really good stuff. Sometimes you don’t want to filter entire groups; you want to filter which rows are included in a specific aggregate calculation within a group. This is where the FILTER clause comes in, and it’s a game-changer for clarity.
Imagine you want a list of departments, along with the total number of employees and, separately, the number of employees who make over $100,000. The old way involved messy CASE statements inside COUNT:
-- The old, clunky way
SELECT department_id,
COUNT(*) AS total_employees,
COUNT(CASE WHEN salary > 100000 THEN 1 END) AS highly_paid_employees
FROM employees
GROUP BY department_id;
The FILTER clause makes this intention crystal clear. It’s like applying a WHERE clause to just one aggregate function.
-- The new, elegant, "why wasn't this always a thing?" way
SELECT department_id,
COUNT(*) AS total_employees,
COUNT(*) FILTER (WHERE salary > 100000) AS highly_paid_employees
FROM employees
GROUP BY department_id;
Read that aloud: “Count all rows, and count all rows filtered where salary is greater than 100,000.” It’s beautiful. It’s semantic. It’s what FILTER was born to do. You can use it with SUM, AVG, pretty much any aggregate. The biggest pitfall here is that support isn’t universal yet. It’s part of the SQL standard and is supported by PostgreSQL and SQLite, but as of my last update, MySQL and some others are still behind the times. Always check your database’s documentation, but use it wherever you can—it makes your code so much more readable.
So remember: WHERE filters rows pre-aggregation. HAVING filters groups post-aggregation. And FILTER surgically targets which rows go into a single aggregate function. Master these three, and you truly own the data shaping process.