10.9 N-BEATS and N-HiTS: State-of-the-Art DL Forecasting

Right, so you’ve slogged through the ARIMAs and the Prophet models, and you’re ready for the big leagues: pure, unadulterated deep learning for forecasting. Forget the kitchen sink approach of throwing in exogenous variables and hoping for the best. We’re going to let the model do the heavy lifting. Enter N-BEATS and its sleeker successor, N-HiTS. These aren’t your overhyped, inscrutable black boxes; they’re actually elegant, interpretable, and frighteningly effective. I’m talking about models that look at a time series and say, “I got this,” without needing you to hand-hold it through every holiday and calendar event.

The core genius of these models is their refusal to be boring. They don’t just learn a single massive function. Instead, they’re built as a stack of blocks (hence the name: Neural Basis Expansion Analysis for Time Series). Each block is a small, self-contained learning machine that specializes in a particular aspect of the signal. The first block might learn the overall trend, the next a yearly seasonality, the next a weekly pattern, and so on. It’s a divide-and-conquer strategy on steroids, and it works because it mirrors how we might intuitively decompose a problem ourselves.

The Architectural Guts: How a Block Thinks

Let’s crack open a single N-BEATS block. It has two key parts: the backcast and the forecast. You feed it a window of historical data (the “lookback” period). Inside, it uses fully connected layers to produce two sets of coefficients: one for a “trend” basis and one for a “seasonality” basis. These bases are pre-defined functions (like polynomials for trend, harmonics for seasonality) that the model can use as building blocks.

The magic is in the double output. The block doesn’t just spit out a future forecast. It first produces a backcast: its best guess for reconstructing the input window. The error between this reconstruction and the actual input is what the block uses to learn. Then, it produces the forecast for the future horizon. The real elegance is that the next block in the stack doesn’t get the raw input. It gets the residual—what the previous block failed to explain. So each subsequent block is forced to learn a different component, refining the forecast step-by-step. It’s like a team of experts, each one fixing the mistakes of the last.

import torch
import torch.nn as nn

class NBeatsBlock(nn.Module):
    """A single N-BEATS block. This is the fundamental building unit."""
    def __init__(self, input_size, theta_size, basis_function):
        super().__init__()
        self.input_size = input_size
        self.theta_size = theta_size
        self.basis_function = basis_function  # e.g., trend or seasonality basis

        # The "brain" of the block: a small stack of FC layers
        self.layers = nn.Sequential(
            nn.Linear(input_size, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, theta_size),
        )

    def forward(self, x):
        # x shape: (batch_size, input_size)
        theta = self.layers(x)  # Learn the coefficients for the basis functions
        backcast_basis, forecast_basis = self.basis_function(theta)
        backcast = torch.einsum('bp,bpt->bt', theta, backcast_basis)
        forecast = torch.einsum('bp,bpt->bt', theta, forecast_basis)
        return backcast, forecast

# Example usage (conceptual):
# block = NBeatsBlock(input_size=lookback, theta_size=100, basis_function=my_trend_basis)
# residual = input_data
# for block in stack:
#     backcast, forecast = block(residual)
#     residual = residual - backcast  # Pass the residual to the next block
#     total_forecast += forecast

Where N-BEATS Gets Annoying and N-HiTS Saves the Day

Now, the elephant in the room: N-BEATS is a computational hog. Its interpretability comes at a cost. All those fully connected layers and matrix multiplications for each block add up, especially for long lookback windows. The authors, to their credit, knew this. So they built N-HiTS (Neural Hierarchical Interpolation for Time Series Forecasting), which is essentially N-BEATS after a serious round of optimization.

N-HiTS introduces two brilliant ideas. First, it uses convolutional layers instead of dense layers inside each block. This is a no-brainer; convs are just more efficient at processing sequential data. Second, and more importantly, it employs hierarchical interpolation and multi-rate sampling. In plain English: the model processes the input at different scales. Early blocks in the stack look at a heavily downsampled version of the signal to capture broad trends, while later blocks look at higher-resolution data to nail the fine details. This is not just a neat trick; it’s a fundamental improvement that drastically reduces the number of parameters and training time while often improving accuracy. It’s the rare win-win that actually deserves the name.

The Gotchas: Where This Stuff Actually Breaks

Don’t for a second think these models are invincible. They have their kryptonite.

Small Data: These are deep learning models. They have appetites. If you throw a 100-point time series at them, they’ll overfit spectacularly and you’ll deserve the nonsense forecast you get. Use them where you have long, rich series (thousands of points).
The Lookback Hyperparameter: Choosing your lookback length is critical. Too short, and the model is myopic; too long, and it gets sluggish and confused by ancient, irrelevant history. A good rule of thumb is to set it to at least 2-3 times your forecast horizon. Better yet, make it a multiple of your strongest seasonality period.
Interpretability is a Feature, Not a Guarantee: The trend and seasonality decomposition is learned. It’s not a perfect statistical decomposition like STL. On messy, real-world data, the “trend” a block learns might include some seasonality, and vice versa. Trust the overall forecast more than your over-literal interpretation of each component.
The Installation Nightmare: Let’s be honest. pip install pytorch-lightning is easy, but getting the right CUDA drivers for GPU support can feel like negotiating with a vengeful god. My advice? Use a pre-configured environment like Colab to prototype before you battle your local machine.

The bottom line? For univariate time series without a ton of external regressors, N-BEATS and especially N-HiTS are often the best tools in the box. They respect the data, they explain themselves, and they deliver. Just make sure you have enough of that data to feed them.