12.3 Polynomial and Interaction Features

Right, let’s talk about making your data more… interesting. You’ve got your nice, neat, linear features. They’re fine. They’re polite. But the real world isn’t polite; it’s messy, curved, and full of relationships where two things together create a third, unexpected thing. That’s where polynomial and interaction features come in. They’re how we take our vanilla dataset and give it a shot of espresso, teaching our linear models to see the world in more than just straight lines.

Think of it this way: trying to fit a straight line to a curved pattern is like trying to push a rope. It’s futile and you look a bit silly. By adding polynomial features (like x², x³), we’re bending the feature space itself so that what was a curve in the original space becomes a straight line in the new, warped space. Our simple linear model can then draw a straight line in that warped space, which corresponds to a complex curve back in the real world. It’s a brilliant hack.

Interaction features are a different beast. They’re the “well, actually…” of machine learning. A model might think the effect of feature_a and feature_b are independent. But what if the real impact is only felt when both are present? The classic (if overused) example is: combining coffee (feature_a) and sugar (feature_b) creates a delicious latte (interaction_feature), which is a much better experience than each alone. An interaction feature, usually created by multiplying two features (a * b), explicitly creates a new feature that captures this synergy (or antagonism) for the model to use.

The Brutal Reality of the Combinatorial Explosion

Here’s where the designers of scikit-learn both solved a problem and created a monster. The PolynomialFeatures transformer is incredibly easy to use. Too easy. Let me show you the good part first.

import numpy as np
from sklearn.preprocessing import PolynomialFeatures

# Some boring, one-dimensional data that clearly has a curve
X = np.array([[1], [2], [3], [4]])
print("Original Data:\n", X)

# Let's add polynomial features up to degree 2
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(X)
print("\nData with Polynomial Features (degree=2):\n", X_poly)
print("\nFeature names:", poly.get_feature_names_out())

This will output:

Original Data:
 [[1]
 [2]
 [3]
 [4]]

Data with Polynomial Features (degree=2):
 [[1. 1.]
 [2. 4.]
 [3. 9.]
 [4. 16.]]

Feature names: ['x0' 'x0^2']

Simple, right? We went from one feature (x0) to two: the original and its square (x0^2). Now, hold my coffee and watch this. Let’s use a slightly more realistic 2D dataset.

# Now with two features
X_2d = np.array([[1, 2], [3, 4], [5, 6]])
print("Original 2D Data:\n", X_2d)

# Now the same polynomial transformation
poly = PolynomialFeatures(degree=2, include_bias=False)
X_2d_poly = poly.fit_transform(X_2d)
print("\nFeature names for 2D data:", poly.get_feature_names_out())

Output:

Feature names for 2D data: ['x0' 'x1' 'x0^2' 'x0 x1' 'x1^2']

See what happened? We started with 2 features (x0, x1). With degree=2, we get:

The original features: x0, x1
The pure polynomial terms: x0^2, x1^2
The interaction term: x0 x1 (which is x0 * x1)

This is the default interaction_only=False setting, which gives you everything. The number of features doesn’t just grow linearly; it explodes. It follows the formula n_features_output = (n_features_input + degree)! / (n_features_input! * degree!) which is a mouthful and a memory hog. If you have 100 features and ask for degree=2, you’ll get over 5000 features. For degree=3, you’re looking at over 170,000. Good luck with that. Your model will train for a week and overfit so spectacularly it will think a random pixel in a cat photo is a key predictor of stock prices.

How to Use This Power Without Blowing Your Foot Off

So, do we abandon the concept? Absolutely not. We just have to be smarter than the default settings.

Start Small: Never just blindly set degree=4. Start with degree=2 or degree=3 and see if it helps. Use cross-validation to check if the increased complexity is actually improving your model’s performance on unseen data, not just its ability to memorize the training set.
The Magic of interaction_only: This is your best friend. Often, the pure polynomial terms (x²) are less informative than the interactions between different features. You can create just the interaction terms by setting interaction_only=True. This will only create features like x0 * x1 and won’t create x0^2 or x1^2.

# Just the interactions, please.
poly_interaction = PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)
X_interaction = poly_interaction.fit_transform(X_2d)
print("Feature names (interaction_only):", poly_interaction.get_feature_names_out())

Output:

Feature names (interaction_only): ['x0' 'x1' 'x0 x1']

This is much more manageable. We kept our original features and only added the one interaction term.

Domain Knowledge is Your Filter: The best approach isn’t algorithmic; it’s intellectual. Don’t create all possible interactions. Use your brain. Look at your features and think, “Which of these might actually have a meaningful interaction?” Then create those specific interaction terms manually using df['new_feature'] = df['feature_a'] * df['feature_b']. This gives you maximum insight and minimum garbage.
Scale Your Features! This is non-negotiable. When you multiply features together, especially ones on different scales, you create a new feature on a bonkers scale. If feature_a ranges from 0-1 and feature_b ranges from 100-1000, your interaction term will range from 0-1000. If you feed this into a model without scaling, you’re giving that one term an absurdly loud voice. Always, always use a StandardScaler or MinMaxScaler after you’ve created your polynomial and interaction features.

The key takeaway? Polynomial and interaction features are one of the most powerful tools in your feature engineering arsenal for boosting simple models. They are also the quickest way to create an overfit, unusable mess. Respect the power, understand the math behind the explosion, and use your own judgment to apply them surgically, not like a bomb.