10.3 Stationarity Tests: ADF, KPSS
Right, let’s talk about stationarity tests. This is one of those topics that sounds intimidatingly academic but is actually a brutally practical tool. You can’t just throw a time series at a model and hope for the best. Most classical forecasting models (think ARIMA) have a non-negotiable requirement: your data needs to be stationary.
In plain English, stationarity means your data’s statistical properties—like its mean and variance—don’t have a trend or change over time. It wobbles around a fixed mean with consistent volatility. A non-stationary series, on the other hand, is a troublemaker. It might be on a clear upward climb (like a company’s revenue growth) or have a variance that explodes over time. Fitting a model to non-stationary data is like building a house on a slope without a foundation; your results will just slide into nonsense. These tests are your ground-penetrating radar.
We primarily use two tests to check for this, and they work as a brilliant, slightly bickering duo: the Augmented Dickey-Fuller (ADF) test and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test. The key to using them correctly is understanding their differing personalities.
The Augmented Dickey-Fuller (ADF) Test: The Skeptic
The ADF test is the ultimate skeptic. Its default position (its null hypothesis) is that the series is NOT stationary—that it has a unit root. This is a fancy way of saying the series has a memory that never fades; today’s value is heavily dependent on all previous values, plus a shock, leading to that meandering, trend-following behavior.
You run an ADF test hoping to reject this null hypothesis. You want a low p-value (typically < 0.05) so you can confidently say, “The data does not have a unit root; it is stationary.”
Let’s see it in action. We’ll use a classic non-stationary series: the “random walk.”
import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import adfuller
# Generate a simple random walk - the poster child for non-stationarity
np.random.seed(42) # For reproducibility
n_samples = 100
random_walk = np.cumsum(np.random.randn(n_samples)) # Cumulative sum of random noises
# Run the ADF test
adf_result = adfuller(random_walk)
print(f'ADF Statistic: {adf_result[0]}')
print(f'p-value: {adf_result[1]}')
print('Critical Values:')
for key, value in adf_result[4].items():
print(f' {key}: {value}')
# Output will likely look something like:
# ADF Statistic: -1.234 (not a very large negative number)
# p-value: 0.655 (very high, > 0.05 -> fail to reject null hypothesis)
# Conclusion: Series is non-stationary (has a unit root). Correct!
The ADF statistic is a negative number. The more negative it is, the stronger the evidence against the null hypothesis (i.e., for stationarity). In this case, our statistic isn’t very negative, and the p-value is huge, so we correctly fail to reject the null and conclude our random walk is non-stationary. Now let’s test its opposite: pure random noise.
# Generate a stationary series: white noise
white_noise = np.random.randn(n_samples)
adf_result_noise = adfuller(white_noise)
print(f'\nADF Statistic for White Noise: {adf_result_noise[0]}')
print(f'p-value for White Noise: {adf_result_noise[1]}')
# Output will likely look something like:
# ADF Statistic for White Noise: -7.823 (a very large negative number)
# p-value for White Noise: 0.000 (very low, < 0.05 -> reject null hypothesis)
# Conclusion: Series is stationary. Correct!
Perfect. The ADF test correctly identified the stationary white noise.
The KPSS Test: The Optimist
Now meet the KPSS test. It has the opposite disposition. Its null hypothesis is that the series IS stationary (specifically, level or trend stationary). You run a KPSS test and hope you fail to reject the null, meaning you don’t have enough evidence to say it’s not stationary.
A high p-value (> 0.05) here is good news—it suggests the series is stationary. A low p-value means you reject the null and must conclude the series is non-stationary. This yin-and-yang relationship with ADF is why we use them together.
from statsmodels.tsa.stattools import kpss
# Test our random walk with KPSS
kpss_result_walk = kpss(random_walk, regression='c') # 'c' for constant (level) stationarity
print(f'KPSS Statistic (Random Walk): {kpss_result_walk[0]}')
print(f'p-value (Random Walk): {kpss_result_walk[1]}')
# Output will likely show a high test statistic and a tiny p-value (< 0.05)
# Conclusion: Reject the null -> series is non-stationary. Correct!
# Test our white noise with KPSS
kpss_result_noise = kpss(white_noise, regression='c')
print(f'\nKPSS Statistic (White Noise): {kpss_result_noise[0]}')
print(f'p-value (White Noise): {kpss_result_noise[1]}')
# Output will likely show a low test statistic and a high p-value (> 0.05)
# Conclusion: Fail to reject null -> series is stationary. Correct!
The Grand Unified Theory: Interpreting ADF and KPSS Together
This is where it gets truly useful. You run both tests and compare the results like a detective reconciling two witness statements. The combined verdict is your answer.
| ADF (p-value) | KPSS (p-value) | Conclusion |
|---|---|---|
| < 0.05 (Reject Null) | > 0.05 (Fail to Reject Null) | Stationary. Both tests agree. Go forth and model. |
| > 0.05 (Fail to Reject Null) | < 0.05 (Reject Null) | Non-Stationary. Both tests agree. You need to difference the data. |
| < 0.05 (Reject Null) | < 0.05 (Reject Null) | Difference Stationary. The ADF says no unit root, but KPSS says it’s not trend-stationary. Differencing the series will likely make it fully stationary. |
| > 0.05 (Fail to Reject Null) | > 0.05 (Fail to Reject Null) | Trend Stationary. The series has a deterministic trend (not a stochastic unit root). You should de-trend it first, rather than difference it. |
The third case is the most common “disagreement” and is actually incredibly helpful. It tells you the series isn’t a random walk, but it still has some unwanted structure that differencing will clean up.
Best Practices and Pitfalls
- Always Specify the Regression: For KPSS, you must choose
regression='c'(for constant/level) orregression='ct'(for constant and trend). If your data visually has a clear trend, use'ct'. If you’re not sure, try both. Using'c'on a trended series will give you a false non-stationary result. - Check Those Critical Values: The ADF statistic must be more negative than the critical value at your chosen significance level (usually 5%) to reject the null. Don’t just rely on the p-value; look at the full output.
- It’s Not Magic: These tests have limited power, especially on short series. They can struggle with complex seasonality or multiple structural breaks. Your eyes on a plot are still a vital tool.
- The Goal is Modeling, Not Testing: Don’t get obsessed with achieving perfect stationarity according to the tests. The real question is: does making the series stationary improve your model’s forecasting performance? Sometimes slightly non-stationary residuals are acceptable if the predictions are accurate and well-behaved. The test is a means to an end, not the end itself.