19.7 Running Totals and Moving Averages with Frame Clauses
Alright, let’s get our hands dirty with the real sorcery of window functions: frame clauses. You’ve seen PARTITION BY and ORDER BY; they’re the opening act. The frame clause is the headliner. It’s how you define a sliding window within your partition, and it’s the secret sauce behind running totals and moving averages.
Think of it this way: PARTITION BY carves up your data into islands. ORDER BY lines up the houses on each island in a specific order. The frame clause then lets you walk down the street on that island and say, “For this house, I want to look at the house itself and the two houses before it.” Or, “For this house, I want to look at every house from the start of the island up to this one.” That’s the magic.
The Anatomy of a Frame: ROWS vs. RANGE
The frame clause always comes after the ORDER BY (you must have an ORDER BY to use a frame clause). Its syntax looks like this:
ROWS BETWEEN start AND end
Or, less commonly but still importantly:
RANGE BETWEEN start AND end
The key difference between ROWS and RANGE is one of physicality versus logic.
ROWSis about physical rows. It counts rows based on their position in the ordered partition. It’s precise and fast. “Give me the previous row.” Simple.RANGEis about logical values. It groups together all rows that have the sameORDER BYvalue as the current row or as the boundary. This can be a bit weird, and we’ll get to its quirks.
For most things you’ll do—running totals, moving averages—ROWS is your workhorse. RANGE is for more niche scenarios, like creating a rolling window based on a date range instead of a fixed number of rows.
Your Go-To Frame: UNBOUNDED PRECEDING to CURRENT ROW
This is the frame clause you’ll use 80% of the time. It’s the classic “running total” pattern.
Let’s say we have a table of daily coffee sales, and we want a running total of revenue.
SELECT
sale_date,
revenue,
SUM(revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM
daily_sales;
What’s happening here?
UNBOUNDED PRECEDING: This means “start from the very first row in the partition.” (Since we didn’t usePARTITION BY, our partition is the whole result set).CURRENT ROW: This is, well, the current row. So for each row, the window function calculates the sum of all rows from the dawn of time up to and including this one. Boom. Running total.
Here’s the pro tip: This frame is the default when you use ORDER BY with a SUM, COUNT, or AVG. The SQL standard designers decided to be “helpful.” So this:
SUM(revenue) OVER (ORDER BY sale_date)
…is exactly the same as the more verbose version above. I hate this. It’s a classic example of a “questionable choice.” It makes the code shorter but far less explicit. A reader has to know this implicit behavior. I always write out the full frame clause. It’s self-documenting and saves future-you (or your colleague) from a headache.
The N-Row Moving Window
This is your “moving average” frame. You don’t want all of history; you just want the last N rows.
Let’s calculate a 3-day moving average of revenue.
SELECT
sale_date,
revenue,
AVG(revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3day
FROM
daily_sales;
Breakdown:
2 PRECEDING: Start the window 2 rows before the current one.CURRENT ROW: End at the current row.
This creates a window that always contains the current row and the two previous rows. The average is calculated over those three values. As you move down the rows, the window slides with you.
What about looking ahead? You can do that too! AND 1 FOLLOWING is perfectly valid. This is useful for things like centering a moving average or comparing a value to the next one in the sequence.
The Quirky Cousin: RANGE
Remember how RANGE is logical? Let’s see it in action. Suppose you have sales that sometimes happen on the same day.
SELECT
sale_date,
revenue,
SUM(revenue) OVER (
ORDER BY sale_date
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total_range
FROM
daily_sales;
If you have two rows with the same sale_date, RANGE will treat them as a single logical point. The frame BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW for both of those rows will be identical: it will include all rows from the start up to and including all rows with that current date. The sum will be the same for both rows. This is often not what you want for a precise running total, which is why ROWS is usually the safer bet.
RANGE truly shines when you want a window based on an interval, like “the last 7 days” including any days that sold nothing (which wouldn’t have a row!). This, however, often requires database-specific syntax for intervals (e.g., INTERVAL '7' DAY PRECEDING in PostgreSQL), which is beyond the standard RANGE clause. It’s powerful, but check your database’s documentation.
Pitfalls and Best Practices
- The
ORDER BYDefault Trap: I said it before, I’ll say it again. Never rely on the default frame. Always write it out explicitly. Your code will be more robust and understandable. - Performance: Frame clauses are brilliant, but they aren’t free. A window that says
ROWS BETWEEN 500 PRECEDING AND CURRENT ROWhas to keep 500 rows in memory for each partition. For huge datasets, this can add up. Be mindful of your window sizes. - The Edge Case at the Beginning: What happens on the first row when your frame is
2 PRECEDING? There are no previous rows! The window is smaller than you defined. The function still works perfectly—it just uses the rows it has. The average for the first row will be itself. For the second row, it will be the average of row 1 and 2. This is correct behavior, but you need to be aware of it when interpreting your results. PARTITION BYis Your Friend: Almost always, you’ll combine this withPARTITION BY. A running total per department? A moving average per stock ticker? No problem. The frame clause works within each partition.
SELECT
department,
sale_date,
revenue,
SUM(revenue) OVER (
PARTITION BY department
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total_by_dept
FROM
department_sales;
This resets the running total every time the department changes. This is the full power of window functions: divide, order, and then slide a frame across the result. Master this, and you’ve just leveled up your SQL game from scribe to wizard.