Bayesian-Optimization | mikePietsch.com

13.7 Early Stopping and Validation Curves

Right, let’s talk about one of the simplest yet most criminally underused tools in your kit: early stopping. You’re training a model, the training accuracy is climbing, and you’re feeling pretty good about yourself. But then you check the validation accuracy and… oh. It peaked twenty epochs ago and has been slowly but surely getting worse. You, my friend, have just watched your model become a champion memorizer, not a learner. It’s overfitting right before your eyes, and you paid for the electricity to make it happen.

13.6 Population-Based Training (PBT)

Right, so you’ve been training your model, babysitting it for days, tweaking learning rates and other knobs by hand. It feels alchemical, doesn’t it? You’re basically a medieval apothecary hoping this newt’s eyeball (a learning rate of 1e-4 instead of 3e-4) will somehow cure the plague. Population-Based Training, or PBT, is here to drag this process out of the dark ages and into the gloriously brutal arena of natural selection. It’s like The Hunger Games, but for your hyperparameters. May the odds be ever in your favor.

13.5 Neural Architecture Search (NAS)

Right, so you’ve built your model, you’ve got your data, and now you’re staring down the barrel of a thousand knobs to turn. Learning rate, batch size, number of layers… it’s enough to make you want to just pick numbers out of a hat. Hyperparameter optimization (HPO) is the formal process of not doing that. But what if the biggest, most architectural knob of them all—the very structure of the neural network itself—is also just a hyperparameter? Enter Neural Architecture Search, or NAS. It’s the meta-game of machine learning: using machine learning to design your machine learning. It’s as gloriously recursive as it is computationally expensive.

13.4 Hyperband and ASHA: Multi-Fidelity Optimization

Right, so you’ve been patiently training models one at a time, babysitting them like they’re toddlers learning to walk, only to watch most of them fall flat on their faces after hours of computation. It feels wasteful, doesn’t it? Like paying for a full five-course meal for every first date. Multi-fidelity optimization is the brilliant, slightly ruthless friend who says, “Let’s just get them a coffee first to see if they’re interesting.” Instead of committing full resources to every candidate, we get a cheap, early estimate of their potential and then double down on the winners. It’s the investing strategy of the ML world: a diversified portfolio with rapid, brutal cut-offs.

13.3 Optuna: Define-by-Run API and Pruning

Right, so you’ve decided to stop just randomly poking at numbers like learning_rate=1e-4 and hoping for the best. Good. Welcome to the grown-up table. We’re going to talk about Optuna, which is, frankly, one of the best things to happen to hyperparameter optimization. Its “define-by-run” API is the key reason why. It feels like you’re just writing a script, not filling out a government form in triplicate. Other libraries (I’m looking at you, Hyperopt) make you define your search space statically before your trial logic. It’s clunky. Optuna’s define-by-run approach lets you dynamically define parameters right where you need them, inside your objective function. This is wildly more flexible. Need to conditionally suggest a parameter based on another? Go for it. Want to structure your code logically? Nothing’s stopping you.

13.2 Bayesian Optimization: Gaussian Processes and Acquisition Functions

Right, so you’ve been grid searching. Bless your heart. You’ve set up your parameter grids, fired it off, and gone to get a coffee. You came back a day later to find your model has barely budged, and you’ve burned enough compute cycles to power a small moon. There’s got to be a smarter way to find good hyperparameters than just brute force, right? There is. It’s called Bayesian Optimization, and it’s basically the opposite of guessing. It’s about being clever, learning from each experiment, and using probability to guide your next move.

13.1 Grid Search and Random Search: Baselines

Alright, let’s talk about the two most straightforward, no-nonsense ways to tune your model’s knobs: Grid Search and Random Search. Think of this as calibrating your high-tech espresso machine. You could methodically try every single combination of grind size, water temperature, and pressure (Grid Search), or you could just start spinning dials randomly and hope for the best (Random Search). Surprisingly, the latter is often the smarter move. Let’s break down why.