16.1 COUNT, SUM, AVG, MIN, MAX: The Core Aggregates
Right, let’s get our hands dirty with the core aggregate functions. These are the workhorses, the ones you’ll use in 90% of your analytical queries. They’re deceptively simple. COUNT, SUM, AVG, MIN, and MAX seem straightforward, but the devil—and the real power—is in the details.
Think of them as a tiny, efficient factory you attach to a group of rows. You feed a bunch of values in, and the factory spits out a single, summarized value. SUM adds everything up, AVG calculates the mean, and so on. The magic happens when you pair them with GROUP BY, which we’ll get to shortly. For now, let’s meet the team.
The Core Five: A Quick Roll Call
We’ll use a simple sales table for our examples. Imagine it’s from a company that sells…artisanal, hand-forged database indexes. It’s a niche market.
CREATE TABLE sales (
sale_id SERIAL PRIMARY KEY,
product_name VARCHAR(100),
region VARCHAR(50),
sale_amount DECIMAL(10, 2),
sale_date DATE
);
-- Sample Data
INSERT INTO sales (product_name, region, sale_amount, sale_date) VALUES
('B-Tree Index', 'North America', 1000.00, '2023-10-01'),
('Hash Index', 'Europe', 750.50, '2023-10-01'),
('B-Tree Index', 'Europe', 1200.00, '2023-10-02'),
('GiST Index', 'North America', 2500.00, '2023-10-02'),
('B-Tree Index', 'Asia', 800.00, '2023-10-03'),
('Hash Index', 'Asia', 500.00, '2023-10-03'),
('GiST Index', 'Europe', 2200.00, '2023-10-04');
Now, let’s see the aggregates in their simplest form: over the entire table.
SELECT
COUNT(*) AS total_transactions,
SUM(sale_amount) AS total_revenue,
AVG(sale_amount) AS average_order_value,
MIN(sale_date) AS first_sale,
MAX(sale_date) AS last_sale
FROM sales;
| total_transactions | total_revenue | average_order_value | first_sale | last_sale |
|---|---|---|---|---|
| 7 | 8950.50 | 1278.64 | 2023-10-01 | 2023-10-04 |
Simple, right? But already, we’ve hit our first “gotcha.” Look at that AVG(). It’s a classic.
COUNT(*) vs. COUNT(column_name): The Null Kerfuffle
This is the single most important distinction to burn into your brain. COUNT(*) counts rows. It doesn’t care about values; it just counts every single row in the group. COUNT(column_name) counts non-null values in that specific column.
This is where most beginners faceplant. Let’s add a row with a NULL sale_amount to our table.
INSERT INTO sales (product_name, region, sale_amount, sale_date)
VALUES ('Free GIN Index', 'North America', NULL, '2023-10-05');
Now, watch the difference:
SELECT
COUNT(*) AS count_star,
COUNT(sale_amount) AS count_sale_amount,
SUM(sale_amount) AS total_revenue,
AVG(sale_amount) AS avg_naive,
AVG(COALESCE(sale_amount, 0)) AS avg_with_coalesce
FROM sales;
| count_star | count_sale_amount | total_revenue | avg_naive | avg_with_coalesce |
|---|---|---|---|---|
| 8 | 7 | 8950.50 | 1278.64 | 1118.81 |
See that? COUNT(*) is 8 (all rows), but COUNT(sale_amount) is only 7 (non-null values). This directly affects AVG(). By definition, AVG(column) is SUM(column) / COUNT(column). It completely ignores the NULL row, so the average is still 8950.50 / 7. If you want to treat NULL as a zero, you have to be explicit with AVG(COALESCE(sale_amount, 0)), which would be 8950.50 / 8. This isn’t a design flaw; it’s mathematical correctness. NULL means “unknown,” not zero. The database is right, and you were probably wrong. I know I was the first time.
SUM and AVG: They Only Speak Numbers
This one seems obvious, but you’d be surprised. You can’t SUM(VARCHAR) or AVG(DATE). Well, you can try, but the database will give you an error that’s the equivalent of a disgusted look. These functions operate exclusively on numeric data types, integers, decimals, floats, etc.
MIN and MAX, however, are the polyglots of the group. They work on almost any data type that can be ordered: numbers, strings, dates, even enums. What’s the smallest number? The earliest date? The first product name alphabetically? They’ve got you covered.
SELECT
MIN(product_name) AS first_product_alpha,
MAX(product_name) AS last_product_alpha,
MIN(sale_date) AS earliest_date
FROM sales;
| first_product_alpha | last_product_alpha | earliest_date |
|---|---|---|
| B-Tree Index | Hash Index | 2023-10-01 |
“B-Tree Index” comes before “GiST Index” alphabetically? Actually, no, ‘G’ comes after ‘B’. Wait, what? Ah, the sample data I provided… Hash Index starts with ‘H’, which does come after ‘G’. My bad. See? Even my examples have edge cases. Always check your assumptions. The database always knows.
The DISTINCT Keyword Inside Aggregates: A Power Move
You can slam the DISTINCT keyword right inside your aggregate functions. This is incredibly useful for finding the number of unique values.
SELECT
COUNT(region) AS total_region_entries, -- counts every row's region
COUNT(DISTINCT region) AS unique_regions -- counts each region name only once
FROM sales;
| total_region_entries | unique_regions |
|---|---|
| 8 | 3 |
You can also do this with SUM(DISTINCT sale_amount), but be warned: that sums each unique value only once. If you have two sales of $1000, SUM(DISTINCT sale_amount) would only add $1000 once, which is almost never what you want for financial reporting. It’s a tool for de-duplication within the aggregation, not a general-purpose sum. Use it with purpose.