16.1 COUNT, SUM, AVG, MIN, MAX: The Core Aggregates

Right, let’s get our hands dirty with the core aggregate functions. These are the workhorses, the ones you’ll use in 90% of your analytical queries. They’re deceptively simple. COUNT, SUM, AVG, MIN, and MAX seem straightforward, but the devil—and the real power—is in the details.

Think of them as a tiny, efficient factory you attach to a group of rows. You feed a bunch of values in, and the factory spits out a single, summarized value. SUM adds everything up, AVG calculates the mean, and so on. The magic happens when you pair them with GROUP BY, which we’ll get to shortly. For now, let’s meet the team.

The Core Five: A Quick Roll Call

We’ll use a simple sales table for our examples. Imagine it’s from a company that sells…artisanal, hand-forged database indexes. It’s a niche market.

CREATE TABLE sales (
    sale_id SERIAL PRIMARY KEY,
    product_name VARCHAR(100),
    region VARCHAR(50),
    sale_amount DECIMAL(10, 2),
    sale_date DATE
);

-- Sample Data
INSERT INTO sales (product_name, region, sale_amount, sale_date) VALUES
('B-Tree Index', 'North America', 1000.00, '2023-10-01'),
('Hash Index', 'Europe', 750.50, '2023-10-01'),
('B-Tree Index', 'Europe', 1200.00, '2023-10-02'),
('GiST Index', 'North America', 2500.00, '2023-10-02'),
('B-Tree Index', 'Asia', 800.00, '2023-10-03'),
('Hash Index', 'Asia', 500.00, '2023-10-03'),
('GiST Index', 'Europe', 2200.00, '2023-10-04');

Now, let’s see the aggregates in their simplest form: over the entire table.

SELECT
    COUNT(*) AS total_transactions,
    SUM(sale_amount) AS total_revenue,
    AVG(sale_amount) AS average_order_value,
    MIN(sale_date) AS first_sale,
    MAX(sale_date) AS last_sale
FROM sales;

total_transactions	total_revenue	average_order_value	first_sale	last_sale
7	8950.50	1278.64	2023-10-01	2023-10-04

Simple, right? But already, we’ve hit our first “gotcha.” Look at that AVG(). It’s a classic.

COUNT(*) vs. COUNT(column_name): The Null Kerfuffle

This is the single most important distinction to burn into your brain. COUNT(*) counts rows. It doesn’t care about values; it just counts every single row in the group. COUNT(column_name) counts non-null values in that specific column.

This is where most beginners faceplant. Let’s add a row with a NULL sale_amount to our table.

INSERT INTO sales (product_name, region, sale_amount, sale_date)
VALUES ('Free GIN Index', 'North America', NULL, '2023-10-05');

Now, watch the difference:

SELECT
    COUNT(*) AS count_star,
    COUNT(sale_amount) AS count_sale_amount,
    SUM(sale_amount) AS total_revenue,
    AVG(sale_amount) AS avg_naive,
    AVG(COALESCE(sale_amount, 0)) AS avg_with_coalesce
FROM sales;

count_star	count_sale_amount	total_revenue	avg_naive	avg_with_coalesce
8	7	8950.50	1278.64	1118.81

See that? COUNT(*) is 8 (all rows), but COUNT(sale_amount) is only 7 (non-null values). This directly affects AVG(). By definition, AVG(column) is SUM(column) / COUNT(column). It completely ignores the NULL row, so the average is still 8950.50 / 7. If you want to treat NULL as a zero, you have to be explicit with AVG(COALESCE(sale_amount, 0)), which would be 8950.50 / 8. This isn’t a design flaw; it’s mathematical correctness. NULL means “unknown,” not zero. The database is right, and you were probably wrong. I know I was the first time.

SUM and AVG: They Only Speak Numbers

This one seems obvious, but you’d be surprised. You can’t SUM(VARCHAR) or AVG(DATE). Well, you can try, but the database will give you an error that’s the equivalent of a disgusted look. These functions operate exclusively on numeric data types, integers, decimals, floats, etc.

MIN and MAX, however, are the polyglots of the group. They work on almost any data type that can be ordered: numbers, strings, dates, even enums. What’s the smallest number? The earliest date? The first product name alphabetically? They’ve got you covered.

SELECT
    MIN(product_name) AS first_product_alpha,
    MAX(product_name) AS last_product_alpha,
    MIN(sale_date) AS earliest_date
FROM sales;

first_product_alpha	last_product_alpha	earliest_date
B-Tree Index	Hash Index	2023-10-01

“B-Tree Index” comes before “GiST Index” alphabetically? Actually, no, ‘G’ comes after ‘B’. Wait, what? Ah, the sample data I provided… Hash Index starts with ‘H’, which does come after ‘G’. My bad. See? Even my examples have edge cases. Always check your assumptions. The database always knows.

The DISTINCT Keyword Inside Aggregates: A Power Move

You can slam the DISTINCT keyword right inside your aggregate functions. This is incredibly useful for finding the number of unique values.

SELECT
    COUNT(region) AS total_region_entries, -- counts every row's region
    COUNT(DISTINCT region) AS unique_regions -- counts each region name only once
FROM sales;

total_region_entries	unique_regions
8	3

You can also do this with SUM(DISTINCT sale_amount), but be warned: that sums each unique value only once. If you have two sales of $1000, SUM(DISTINCT sale_amount) would only add $1000 once, which is almost never what you want for financial reporting. It’s a tool for de-duplication within the aggregation, not a general-purpose sum. Use it with purpose.