13.1 Grid Search and Random Search: Baselines

Alright, let’s talk about the two most straightforward, no-nonsense ways to tune your model’s knobs: Grid Search and Random Search. Think of this as calibrating your high-tech espresso machine. You could methodically try every single combination of grind size, water temperature, and pressure (Grid Search), or you could just start spinning dials randomly and hope for the best (Random Search). Surprisingly, the latter is often the smarter move. Let’s break down why.

The Brute Force of Grid Search

Grid Search is the methodical, slightly obsessive-compulsive approach. You define a set of values you want to try for each hyperparameter, and the algorithm will train a model for every single combination in that grid.

Imagine you have two hyperparameters: learning_rate and max_depth. You define:

learning_rate: [0.001, 0.01, 0.1]
max_depth: [3, 5, 7]

Grid Search, with the unwavering dedication of a robot on a monotonous task, will train 3 * 3 = 9 models. It’s exhaustive. It’s guaranteed to find the best point within your pre-defined grid. The code for this is dead simple, which is its main virtue.

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

# Generate a sample dataset
X, y = make_classification(n_samples=1000, n_features=4, random_state=42)

# Define the model
model = RandomForestClassifier(random_state=42)

# Define the hyperparameter grid
param_grid = {
    'max_depth': [3, 5, 7, 10],       # 4 values
    'n_estimators': [50, 100, 200],   # 3 values
    'max_features': ['sqrt', 'log2']  # 2 values
}

# Instantiate the grid search
grid_search = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    cv=5,  # 5-fold cross-validation
    scoring='accuracy',
    n_jobs=-1  # use all available cores (because this will be slow)
)

# Fit the grid search to the data (this is where the magic, and the waiting, happens)
grid_search.fit(X, y)

# Print the best combination
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best cross-validation score: {grid_search.best_score_:.4f}")

The problem here should be immediately obvious. 4 * 3 * 2 = 24 models. But wait, we’re doing 5-fold cross-validation, so that’s 24 * 5 = 120 model fits. Now imagine you add a fourth parameter with 5 values. Suddenly you’re at 4 * 3 * 2 * 5 * 5 = 600 fits. This combinatorial explosion is why Grid Search quickly becomes computationally intractable. You’re spending 99% of your time evaluating hyperparameters that are almost certainly terrible, just to be “thorough.”

The Shocking Wisdom of Random Search

Enter Random Search. Instead of evaluating every single point on a rigid grid, you define a statistical distribution for each hyperparameter (e.g., uniform, log-uniform) and sample a fixed number of random combinations from that space.

The reason this works so well, a result formalized in a classic paper by Bergstra and Bengio, is that for most real-world problems, only a few hyperparameters actually matter. Most models are robust to a wide range of values for the others. Grid Search wastes immense effort on the unimportant ones.

Random Search doesn’t care about your neatly defined grid. It will happily try a learning_rate of 0.00734 and a max_depth of 4. This stochastic nature gives it a much better chance of stumbling into a good region of the hyperparameter space with far fewer iterations. It’s not just faster; it’s often better.

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, loguniform

# Define the parameter distributions instead of a grid
param_distributions = {
    'max_depth': randint(3, 11),              # Integers between 3 and 10
    'n_estimators': randint(50, 301),         # Integers between 50 and 300
    'max_features': ['sqrt', 'log2'],         # Categorical choices
    'learning_rate': loguniform(1e-4, 1e-1)   # Log-scale between 0.0001 and 0.1
}

# Instantiate the random search. Let's try 20 random combinations.
random_search = RandomizedSearchCV(
    estimator=model,
    param_distributions=param_distributions,
    n_iter=20,  # The key setting: how many random combinations to try
    cv=5,
    scoring='accuracy',
    random_state=42,
    n_jobs=-1
)

# Fit the random search
random_search.fit(X, y)

print(f"Best parameters: {random_search.best_params_}")
print(f"Best cross-validation score: {random_search.best_score_:.4f}")

When to Use Which (Because It’s Not Dogmatic)

So, is Grid Search dead? Not quite. Use Grid Search when:

Your dataset is small and model training is cheap.
You already have a very good idea of the narrow range where the optimal values lie for 2-3 critical parameters. You can use Random Search first to narrow it down, then do a fine-grained Grid Search on that small area.
You need 100% reproducible results (Random Search’s results vary unless you set a random_state).

Use Random Search as your default starting point for almost everything else. It’s your hyperparameter workhorse. The number of iterations (n_iter) is your lever between budget and thoroughness. Start with 20 or 50, see what the score curve looks like, and increase if it’s still improving.

The biggest pitfall with both methods? Data leakage during cross-validation. You must ensure your preprocessing steps (like scaling or imputation) are fit only on the training fold of each CV split. Scikit-Learn’s Pipeline is your best friend here. If you don’t use it, you’re almost certainly cheating and will get an overly optimistic score that doesn’t hold up in production. Don’t be that person. Wrap your model and its preprocessors in a pipeline before you toss it into GridSearchCV or RandomizedSearchCV. Trust me on this. I’ve been burned so you don’t have to be.