11.5 Regression Metrics: MAE, MSE, RMSE, R², MAPE

Right, so you’ve built your model. It’s a thing of beauty. You’ve wrangled the data, you’ve tuned the hyperparameters, you’ve trained it on a respectable chunk of your dataset. Now comes the moment of truth: how good is it, actually? For regression problems—where you’re predicting a continuous number, like a house price or a quantity of widgets—you need a way to measure the distance between your model’s fancy predictions and the cold, hard reality of the actual values. That’s where these metrics come in. They’re your measuring tape, and like any good craftsman, you need to know which one to pull out of the toolbox and when.

Mean Absolute Error (MAE): The Honest Workhorse

Let’s start with the one that’s easiest to explain to your manager. The Mean Absolute Error (MAE) is exactly what it sounds like: you take all your prediction errors (actual value - predicted value), convert them to absolute values (so negative errors don’t cancel out positive ones), and then average them.

$$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y}_i|$$

Why do we love it? It’s robust and interpretable. An MAE of 5 means your average prediction is off by about 5 units. That’s it. No squaring, no square roots, no funny business. The downside? It doesn’t punish large errors as severely. Being off by 10 is only twice as bad as being off by 5 in MAE’s eyes. Sometimes that’s fine; sometimes you really care about those big, catastrophic misses.

from sklearn.metrics import mean_absolute_error

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

mae = mean_absolute_error(y_true, y_pred)
print(f"MAE: {mae:.2f}")  # Output: MAE: 0.50

Mean Squared Error (MSE) and Root Mean Squared Error (RMSE): The Punishers

If MAE is the easy-going friend, MSE is the drill sergeant. Mean Squared Error (MSE) takes those errors, squares them, and then averages them.

$$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2$$

Squaring the errors does two things: 1) It makes all values positive (good), and 2) It heavily penalizes larger errors. An error of 10 is one hundred times worse than an error of 1 in the world of MSE. This is great when large errors are completely unacceptable, but it also makes MSE very sensitive to outliers. A few wacky data points can send your MSE through the roof.

The problem? The units are now “squared units.” If you’re predicting house prices in dollars, your MSE is in dollars², which is… unhelpful. Enter Root Mean Squared Error (RMSE). It’s just the square root of MSE.

$$RMSE = \sqrt{MSE}$$

This brings the units back to the original scale, so you can interpret it roughly like MAE, but it still carries that amplified punishment for larger errors. RMSE is probably the most common metric you’ll see, but always remember its sensitivity.

from sklearn.metrics import mean_squared_error

mse = mean_squared_error(y_true, y_pred)
rmse = mse ** 0.5  # Or use np.sqrt(mse)
print(f"MSE: {mse:.2f}")   # Output: MSE: 0.38
print(f"RMSE: {rmse:.2f}") # Output: RMSE: 0.61

R-squared (R²): The “How Much Better Are You Than a Dumb Model?” Metric

This one is a bit more conceptual. R-squared doesn’t measure the absolute size of your errors; it measures the proportion of variance in your dependent variable that’s explained by your model. It answers the question: “How much better is my complex model than just predicting the mean of the target variable every single time?”

An R² of 0 means your model is no better than that simple mean prediction. An R² of 1 is a perfect fit. It can theoretically be negative, which is the universe’s way of telling you your model is so spectacularly bad that you’d be better off just using the mean. It happens, don’t panic. Just go back to feature engineering.

The catch? You can get a deceptively “good” R² value on data with a huge variance, even if your absolute errors (MAE, RMSE) are still massive. Always look at it alongside your error metrics.

from sklearn.metrics import r2_score

r2 = r2_score(y_true, y_pred)
print(f"R²: {r2:.2f}")  # Output: R²: 0.95

Mean Absolute Percentage Error (MAPE): The One with an Ego Problem

MAPE seems like a genius idea on paper. Express the error as a percentage! Everyone understands percentages!

$$MAPE = \frac{100%}{n}\sum_{i=1}^{n}\left|\frac{y_i - \hat{y}_i}{y_i}\right|$$

And it is useful for business contexts where a “10% error” is a more intuitive concept than “an error of 50 widgets.” But MAPE has a massive, glaring flaw that makes me use it only with extreme caution: it becomes completely nonsensical and can blow up to infinity if any of your actual values are zero or close to zero. Dividing by zero? Not great. Even if your actual value is 2 and you predict 1, that’s a 50% error. If your actual value is 1,000,000 and you’re off by 50,000, that’s only a 5% error. This asymmetry can be misleading. Use it only if your data is strictly positive and well away from zero.

# Let's see MAPE's dark side
from sklearn.metrics import mean_absolute_percentage_error

y_true_problematic = [3, 0.001, 2, 7]  # Look at that second value
y_pred_problematic = [2.5, 0.0, 2, 8]

try:
    mape = mean_absolute_percentage_error(y_true_problematic, y_pred_problematic)
    print(f"MAPE: {mape:.2f}")
except Exception as e:
    print(f"MAPE exploded: {e}")

So, which one should you use? The boring-but-true answer is all of them. Each tells you a different part of the story. Report RMSE and MAE to understand the magnitude of your errors. Check R² to see the proportion of variance explained. And if you must use MAPE, first check a histogram of your target variable to make sure you won’t be dividing by zero. Your model’s resume deserves more than one reference.