10.5 SARIMA: Seasonal ARIMA

Right, so you’ve wrestled ARIMA to the ground and you’re feeling pretty good about yourself. You can forecast the next few points of a nice, clean, stationary series. Good for you. Now let’s throw reality at you: most data that matters has seasons. Sales spike in December. Website traffic plummets on weekends. Ice cream consumption is, tragically, not a year-round constant. This is where your basic ARIMA model gives you a helpless shrug. Enter its more sophisticated, slightly more complicated cousin: SARIMA.

SARIMA is just ARIMA with a seasonal component bolted on. And when I say “bolted on,” I mean it in the best way possible—it’s a remarkably elegant extension. The core idea is simple: why should the correlations only be between adjacent points? In seasonal data, a point is also correlated with the point from this time last season. Today’s temperature is probably similar to yesterday’s (that’s your standard ARIMA ‘AR’ term), but it’s also probably similar to the temperature on this same day last year (that’s your new seasonal ‘AR’ term). SARIMA models both.

The acronym gets a bit gnarly: Seasonal AutoRegressive Integrated Moving Average. We represent it as SARIMA(p, d, q)(P, D, Q)[m].

(p, d, q): Your non-seasonal orders, same as our old friend ARIMA. p for AR, d for differencing, q for MA.
(P, D, Q): The exact same thing, but for the seasonal component. P for seasonal AR, D for seasonal differencing, Q for seasonal MA.
[m]: The single most important parameter: the number of time steps in a single seasonal period. Is your data monthly with a yearly pattern? m=12. Daily with a weekly pattern? m=7. Hourly with a daily pattern? m=24. Get this wrong and the whole thing becomes nonsense.

The Logic of Seasonal Differencing

Before we even think about AR or MA terms, we have to make the series stationary. You remember first-order differencing from ARIMA: Y'(t) = Y(t) - Y(t-1). It removes a trend. Seasonal differencing is the same concept but for seasons: Y'(t) = Y(t) - Y(t-m). If you have monthly data with a yearly seasonality, this calculates the difference between this December and last December. It ruthlessly annihilates the seasonal pattern, leaving you with (hopefully) a stationary series to which you can then apply your standard ARIMA model. Sometimes you need both—a first difference and a seasonal difference—to really beat the stationarity into a series. That’s what d and D are for.

Fitting a SARIMA Model: A Realistic Example

Let’s use some classic airline passenger data. It’s got a strong trend and a painfully obvious seasonal pattern. Theory is great, but let’s get our hands dirty.

First, let’s look at the raw data. See that? Upward trend. Massive yearly spikes. This data is screaming for SARIMA.

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Load the classic data
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv"
series = pd.read_csv(url, header=0, index_col=0, parse_dates=True)
series.index = series.index.to_period('M')  # Important for statsmodels

# Plot it
plt.figure(figsize=(10,4))
plt.plot(series)
plt.title("Airline Passengers - A SARIMA Playground")
plt.ylabel("Thousands of Passengers")
plt.tight_layout()
plt.show()

Now, let’s decompose it to see the trend, seasonality, and residuals. This isn’t necessary for modeling, but it’s fantastic for your intuition.

# Decompose the time series
decomposition = seasonal_decompose(series, model='multiplicative')
decomposition.plot()
plt.show()

The ACF plot is your best friend here. A slow decay in the ACF suggests trend (needs differencing, d=1). Peaks at lag 12, 24, 36 etc. scream seasonality (needs seasonal differencing, D=1, with m=12).

# Plot ACF to see the seasonal pattern
plot_acf(series, lags=40)
plt.show()

Based on the plots and some good old-fashioned guesswork, let’s try a model. A common starting point is (1,1,1)(1,1,1,12). We’re using a log transform to handle the multiplicative-looking trend and seasonality.

# For a multiplicative model, often a log transform helps
series_log = np.log(series)

# Fit the model. Note: SARIMAX is the function that handles SARIMA.
# The `enforce_stationarity=False` is a bit of a hack but often necessary for complex models.
model = SARIMAX(series_log,
                order=(1, 1, 1),          # non-seasonal (p,d,q)
                seasonal_order=(1, 1, 1, 12), # seasonal (P,D,Q,m)
                enforce_stationarity=False,
                enforce_invertibility=False)

model_fit = model.fit(disp=False)  # disp=False silences convergence messages

# Print the summary - this is your report card
print(model_fit.summary())

Interpreting the Output and Diagnostics

The summary table is intimidating. Ignore most of it. Look at the coef column for your AR, MA, Seasonal AR, and Seasonal MA terms. The P>|z| column is critical: anything less than 0.05 suggests the term is statistically significant. If your seasonal term has a p-value of 0.4, you probably don’t need it. The real test is the residual diagnostics.

# Plot diagnostics - This is non-negotiable.
model_fit.plot_diagnostics(figsize=(12, 8))
plt.show()

You’re looking for residuals that are basically white noise (no patterns in the top-left correlogram), normally distributed (top-right chart should look like a bell curve), and homoskedastic (the bottom-left plot should have no obvious structure). If you see patterns, your model has left something on the table; go back and adjust your orders.

Forecasting and the Big Caveat

Now for the fun part: predicting the future.

# Forecast the next 3 years (36 months)
forecast_object = model_fit.get_forecast(steps=36)
forecast_index = forecast_object.row_labels # Gets the future dates
forecast_mean = forecast_object.predicted_mean
confidence_intervals = forecast_object.conf_int()

# Remember we log-transformed! We need to reverse that.
forecast_mean_exp = np.exp(forecast_mean)
confidence_intervals_exp = np.exp(confidence_intervals)

# Plot the results
plt.figure(figsize=(10,4))
plt.plot(series, label='Observed')
plt.plot(forecast_mean_exp, label='Forecast') # Plot the forecast means
plt.fill_between(forecast_index,
                 confidence_intervals_exp.iloc[:, 0],
                 confidence_intervals_exp.iloc[:, 1],
                 color='k', alpha=0.1, label='95% CI')
plt.title("SARIMA Forecast for Airline Passengers")
plt.legend()
plt.show()

Here’s the brutal truth the manuals often skip: SARIMA is powerful but fragile. It assumes your seasonal pattern is fixed. If that December spike starts happening earlier or later, or gets stronger or weaker, the model will struggle. It’s extrapolating the exact same seasonal pattern it learned forever into the future. In the real world, seasons evolve. This is the model’s greatest weakness. Use it for short-to-medium-term forecasts where the pattern is stable, and always, always look at those prediction intervals—they’ll show you just how uncertain the future really is.