10.3 Stationarity Tests: ADF, KPSS

Right, let’s talk about stationarity tests. This is one of those topics that sounds intimidatingly academic but is actually a brutally practical tool. You can’t just throw a time series at a model and hope for the best. Most classical forecasting models (think ARIMA) have a non-negotiable requirement: your data needs to be stationary.

In plain English, stationarity means your data’s statistical properties—like its mean and variance—don’t have a trend or change over time. It wobbles around a fixed mean with consistent volatility. A non-stationary series, on the other hand, is a troublemaker. It might be on a clear upward climb (like a company’s revenue growth) or have a variance that explodes over time. Fitting a model to non-stationary data is like building a house on a slope without a foundation; your results will just slide into nonsense. These tests are your ground-penetrating radar.

We primarily use two tests to check for this, and they work as a brilliant, slightly bickering duo: the Augmented Dickey-Fuller (ADF) test and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test. The key to using them correctly is understanding their differing personalities.

The Augmented Dickey-Fuller (ADF) Test: The Skeptic

The ADF test is the ultimate skeptic. Its default position (its null hypothesis) is that the series is NOT stationary—that it has a unit root. This is a fancy way of saying the series has a memory that never fades; today’s value is heavily dependent on all previous values, plus a shock, leading to that meandering, trend-following behavior.

You run an ADF test hoping to reject this null hypothesis. You want a low p-value (typically < 0.05) so you can confidently say, “The data does not have a unit root; it is stationary.”

Let’s see it in action. We’ll use a classic non-stationary series: the “random walk.”

import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import adfuller

# Generate a simple random walk - the poster child for non-stationarity
np.random.seed(42)  # For reproducibility
n_samples = 100
random_walk = np.cumsum(np.random.randn(n_samples))  # Cumulative sum of random noises

# Run the ADF test
adf_result = adfuller(random_walk)
print(f'ADF Statistic: {adf_result[0]}')
print(f'p-value: {adf_result[1]}')
print('Critical Values:')
for key, value in adf_result[4].items():
    print(f'   {key}: {value}')

# Output will likely look something like:
# ADF Statistic: -1.234  (not a very large negative number)
# p-value: 0.655        (very high, > 0.05 -> fail to reject null hypothesis)
# Conclusion: Series is non-stationary (has a unit root). Correct!

The ADF statistic is a negative number. The more negative it is, the stronger the evidence against the null hypothesis (i.e., for stationarity). In this case, our statistic isn’t very negative, and the p-value is huge, so we correctly fail to reject the null and conclude our random walk is non-stationary. Now let’s test its opposite: pure random noise.

# Generate a stationary series: white noise
white_noise = np.random.randn(n_samples)

adf_result_noise = adfuller(white_noise)
print(f'\nADF Statistic for White Noise: {adf_result_noise[0]}')
print(f'p-value for White Noise: {adf_result_noise[1]}')

# Output will likely look something like:
# ADF Statistic for White Noise: -7.823  (a very large negative number)
# p-value for White Noise: 0.000         (very low, < 0.05 -> reject null hypothesis)
# Conclusion: Series is stationary. Correct!

Perfect. The ADF test correctly identified the stationary white noise.

The KPSS Test: The Optimist

Now meet the KPSS test. It has the opposite disposition. Its null hypothesis is that the series IS stationary (specifically, level or trend stationary). You run a KPSS test and hope you fail to reject the null, meaning you don’t have enough evidence to say it’s not stationary.

A high p-value (> 0.05) here is good news—it suggests the series is stationary. A low p-value means you reject the null and must conclude the series is non-stationary. This yin-and-yang relationship with ADF is why we use them together.

from statsmodels.tsa.stattools import kpss

# Test our random walk with KPSS
kpss_result_walk = kpss(random_walk, regression='c') # 'c' for constant (level) stationarity
print(f'KPSS Statistic (Random Walk): {kpss_result_walk[0]}')
print(f'p-value (Random Walk): {kpss_result_walk[1]}')

# Output will likely show a high test statistic and a tiny p-value (< 0.05)
# Conclusion: Reject the null -> series is non-stationary. Correct!

# Test our white noise with KPSS
kpss_result_noise = kpss(white_noise, regression='c')
print(f'\nKPSS Statistic (White Noise): {kpss_result_noise[0]}')
print(f'p-value (White Noise): {kpss_result_noise[1]}')

# Output will likely show a low test statistic and a high p-value (> 0.05)
# Conclusion: Fail to reject null -> series is stationary. Correct!

The Grand Unified Theory: Interpreting ADF and KPSS Together

This is where it gets truly useful. You run both tests and compare the results like a detective reconciling two witness statements. The combined verdict is your answer.

ADF (p-value)	KPSS (p-value)	Conclusion
< 0.05 (Reject Null)	> 0.05 (Fail to Reject Null)	Stationary. Both tests agree. Go forth and model.
> 0.05 (Fail to Reject Null)	< 0.05 (Reject Null)	Non-Stationary. Both tests agree. You need to difference the data.
< 0.05 (Reject Null)	< 0.05 (Reject Null)	Difference Stationary. The ADF says no unit root, but KPSS says it’s not trend-stationary. Differencing the series will likely make it fully stationary.
> 0.05 (Fail to Reject Null)	> 0.05 (Fail to Reject Null)	Trend Stationary. The series has a deterministic trend (not a stochastic unit root). You should de-trend it first, rather than difference it.

The third case is the most common “disagreement” and is actually incredibly helpful. It tells you the series isn’t a random walk, but it still has some unwanted structure that differencing will clean up.

Best Practices and Pitfalls

Always Specify the Regression: For KPSS, you must choose regression='c' (for constant/level) or regression='ct' (for constant and trend). If your data visually has a clear trend, use 'ct'. If you’re not sure, try both. Using 'c' on a trended series will give you a false non-stationary result.
Check Those Critical Values: The ADF statistic must be more negative than the critical value at your chosen significance level (usually 5%) to reject the null. Don’t just rely on the p-value; look at the full output.
It’s Not Magic: These tests have limited power, especially on short series. They can struggle with complex seasonality or multiple structural breaks. Your eyes on a plot are still a vital tool.
The Goal is Modeling, Not Testing: Don’t get obsessed with achieving perfect stationarity according to the tests. The real question is: does making the series stationary improve your model’s forecasting performance? Sometimes slightly non-stationary residuals are acceptable if the predictions are accurate and well-behaved. The test is a means to an end, not the end itself.