16.4 FILTER Clause: Conditional Aggregation
Alright, let’s talk about the FILTER clause. You’re going to love this. It’s one of those SQL features that, once you start using it, you’ll wonder how you ever lived without it. It’s the antidote to a common SQL headache: writing a bunch of messy CASE statements inside your aggregate functions.
The core problem it solves is conditional aggregation. You know the scenario. You want to count things, or sum things, but only for specific rows within your groups. The old-school way, which you’ve probably written a thousand times, involves a CASE expression nested inside the aggregate. It gets the job done, but it’s like reading a sentence with a bunch of parentheses in the middle—it breaks your flow.
The FILTER clause is SQL’s way of saying, “Let’s tidy this up.” It moves the condition outside the aggregate function, but keeps it logically attached. It’s cleaner, it’s more readable, and it honestly just makes more sense.
How FILTER Works Its Magic
The syntax is beautifully straightforward. You write your aggregate function as usual, and then you tag on a WHERE clause just for it.
SELECT
department_id,
COUNT(*) AS total_employees,
COUNT(*) FILTER (WHERE salary > 100000) AS well_paid_employees,
AVG(salary) FILTER (WHERE salary > 100000) AS avg_well_paid_salary
FROM employees
GROUP BY department_id;
See how elegant that is? We’re counting all employees, then we’re counting only the well-paid ones, and then we’re averaging the salaries of only that same well-paid group. Each aggregate can have its own, independent filter. It’s like giving each function its own personal WHERE clause. The database engine processes this exactly as you’d hope: for each group, it applies the filter before performing the aggregation.
Contrast this with the CASE method, which is functionally equivalent but a visual mess:
-- The old, clunky way
SELECT
department_id,
COUNT(*) AS total_employees,
COUNT(CASE WHEN salary > 100000 THEN 1 END) AS well_paid_employees,
AVG(CASE WHEN salary > 100000 THEN salary END) AS avg_well_paid_salary
FROM employees
GROUP BY department_id;
The FILTER clause declares its intent clearly. The CASE version obfuscates it. Which one would you rather debug at 2 AM?
FILTER vs. WHERE vs. HAVING: Knowing the Tools in Your Shed
This is crucial, so pay attention. It’s all about when the filtering happens.
- WHERE: This clause filters rows BEFORE any grouping occurs. It determines which rows are even allowed to be considered for forming groups and aggregates. If a row is filtered out by
WHERE, it’s gone for good—it can’t contribute to any aggregate. - FILTER: This clause is applied DURING aggregation, for a specific function. It only removes rows from consideration for that one aggregate calculation within the already-formed group.
- HAVING: This clause filters entire groups AFTER all aggregation is complete. It’s based on the results of the aggregate functions (e.g.,
HAVING COUNT(*) > 5).
You can, and often should, use them together. Let’s say you want to analyze departments in Seattle that have more than two highly-paid employees.
SELECT
department_id,
COUNT(*) FILTER (WHERE salary > 100000) AS well_paid_count
FROM employees
WHERE office_location = 'Seattle' -- Filter rows first
GROUP BY department_id
HAVING COUNT(*) FILTER (WHERE salary > 100000) > 2; -- Filter groups last
The WHERE ensures we only look at Seattle employees. Then we group them. For each group, we count the well-paid ones. Finally, the HAVING clause throws out any department group that doesn’t have at least three of these high-rollers.
The One Quirk You Absolutely Must Know (The NULL Gotcha)
Here’s the part where I have to be honest with you. The SQL standard designers, in their infinite wisdom, decided that FILTER (WHERE ...) should only exclude rows where the condition is TRUE. Rows where the condition is FALSE or NULL are excluded.
This is almost always what you want, but it can bite you if you’re filtering on a column that can contain NULLs. Consider this:
SELECT
COUNT(*) FILTER (WHERE column_name > 10) AS count
FROM my_table;
A row where column_name is NULL will not be counted. Why? Because NULL > 10 evaluates not to FALSE, but to NULL. And FILTER only keeps TRUE values. This is consistent with how the WHERE clause works, so it’s logically sound, but it’s a common source of confusion for folks who forget about three-valued logic. Always be mindful of your NULLs.
When to Embrace FILTER (And When to Avoid It)
Use FILTER when:
- You need multiple aggregates with different conditions in the same query (the classic use case). It’s a massive win for readability.
- You’re building a pivot table-style query, as it keeps the
SELECTlist organized and sane.
Stick with CASE when:
- You’re working with a database that doesn’t support
FILTER(looking at you, pre-2016 MySQL and some older versions of other DBs). It has excellent standard support in PostgreSQL, SQLite, and newer versions of others, but always check. - Your condition is incredibly complex and involves multiple columns in a way that would make the
FILTERclause span half the screen. Sometimes aCASEis actually more readable for very intricate logic. - You need to return a specific value other than
NULLfor theELSEcase in an aggregate.FILTERonly includes or excludes; it can’t transform values likeCASEcan. For example,SUM(CASE WHEN flag THEN 1 ELSE -1 END)can’t be directly replicated withFILTER.
In a nutshell, FILTER is a tool for writing clearer, more intentional SQL. It doesn’t add new functionality, but it significantly improves the expressiveness of your code. It’s the difference between mumbling and speaking clearly. Use it. Your future self, who has to read this query again, will thank you.