19.5 LAG and LEAD: Accessing Previous and Next Rows
Alright, let’s get our hands dirty with two of the most useful tools in the SQL toolbox: LAG and LEAD. If you’ve ever found yourself staring at a row of data and desperately wanting to peek at the row above or below it to calculate a difference, a trend, or just to see what that idiot user did next, these functions are about to become your best friends. They are the quintessential “time-travel” functions of SQL, allowing you to reference other rows in your result set relative to your current row without the soul-crushing misery of a self-join.
The concept is brilliantly simple. LAG lets you look backwards at a previous row. LEAD lets you look forwards at a next row. It’s like having a rearview mirror and a windshield for your data. You provide the column you want to look at, and optionally, how many rows to skip back or forward.
The Basic Syntax: It’s Easier Than It Looks
Here’s the blueprint. Don’t worry, it’s less intimidating than it seems.
LAG(column_name [, offset [, default_value]]) OVER (
[PARTITION BY partition_expression]
ORDER BY order_expression
)
LEAD(column_name [, offset [, default_value]]) OVER (
[PARTITION BY partition_expression]
ORDER BY order_expression
)
column_name: The value you want to retrieve from another row.offset(optional): The number of rows to go backwards (LAG) or forwards (LEAD). Defaults to 1 if you don’t specify it. Want to look two rows back?LAG(column, 2).default_value(optional): What to return if theLAGorLEADfunction tries to look beyond the scope of the partition. If you don’t specify this, it will returnNULL, which is usually fine but can break your math if you’re not careful (more on that later).PARTITION BY: This is your “reset” button. The function will only look within the group of rows that share the same value(s) in the partition. No partitioning? It treats the whole result set as one big group.ORDER BY: This is non-negotiable and absolutely critical. This defines what “previous” and “next” actually mean. You must tell the database the order of your rows. Without it, “previous” is a meaningless concept, and the database will rightly refuse to humor you.
A Concrete Example: Calculating Day-over-Day Growth
Let’s say we have a table daily_sales. Looking at raw daily numbers is okay, but what we really care about is growth. How much more (or less) did we sell today compared to yesterday? This is a classic LAG use case.
SELECT
sale_date,
amount,
LAG(amount) OVER (ORDER BY sale_date) AS previous_day_amount,
amount - LAG(amount) OVER (ORDER BY sale_date) AS daily_growth
FROM daily_sales
ORDER BY sale_date;
| sale_date | amount | previous_day_amount | daily_growth |
|---|---|---|---|
| 2023-10-01 | 100 | NULL | NULL |
| 2023-10-02 | 150 | 100 | 50 |
| 2023-10-03 | 125 | 150 | -25 |
| 2023-10-04 | 200 | 125 | 75 |
See what happened? For the first row (2023-10-01), there is no previous row to look at, so LAG returns NULL. This is why the daily_growth is also NULL. This isn’t an error; it’s correct behavior. You can’t calculate a difference from nothing. We’ll talk about how to handle this gracefully in a second.
Partitioning: The “Reset” Button
Let’s make it more interesting. What if we have sales for multiple products? We don’t want to compare the sales of Product B yesterday to Product A today. That’s nonsense. We need to calculate the growth per product. This is where PARTITION BY saves the day.
SELECT
product_id,
sale_date,
amount,
LAG(amount) OVER (
PARTITION BY product_id
ORDER BY sale_date
) AS prev_amount
FROM sales
ORDER BY product_id, sale_date;
Now, LAG will only look at rows with the same product_id. When it hits the first sale for a new product, prev_amount will be NULL again, neatly resetting the calculation for the new product group. It’s like a little GROUP BY for your window function.
The Pitfalls and The “Gotchas”
The Dreaded NULL: As we saw, the first row in a partition has no previous row, so
LAGreturnsNULL. If you then try to do math with thatNULL(e.g.,amount - NULL), the result is alwaysNULL. This can turn your beautiful trend line into a messy chart with a missing first data point. The solution? Use thedefault_valueparameter or theCOALESCEfunction.LAG(amount, 1, 0)would return0instead ofNULLfor the first row, so your calculation becomesamount - 0. Whether this makes sense for your use case is up to you. SometimesNULLis the correct answer.The Absolutely Critical ORDER BY: I cannot stress this enough. The
ORDER BYin yourOVER()clause is what defines the sequence. Get this wrong, and your results are garbage. Ordering by an ambiguous column, or worse, forgetting theORDER BYentirely, is a one-way ticket to incorrect-data ville. The database will sometimes let you do this, which is a design choice I find questionable. Always double-check your order.
LEAD: The Exact Same Thing, But Forward
Everything we’ve said about LAG applies to LEAD. It’s just the opposite direction. Instead of looking at the previous row, it looks at the next one. It’s perfect for calculating “next value” or looking ahead.
SELECT
user_id,
event_time,
event_name,
LEAD(event_name) OVER (
PARTITION BY user_id
ORDER BY event_time
) AS next_event
FROM user_events;
This query would show you what a user did immediately after their current action. Powerful stuff for analyzing user journeys. And yes, for the last row in the partition, LEAD will return NULL.
Why This Beats a Self-Join
You could theoretically do this with a self-join, aliasing the table to itself and joining on a.id = b.id - 1. It’s a mess. It’s inefficient. It falls apart completely once you introduce PARTITION BY. Window functions are declarative—you tell the database what you want (“the previous value”), not how to do it. The optimizer can then choose the most efficient way to execute it, which is almost always faster and cleaner than the procedural self-join nightmare. Use the right tool for the job. This is it.