Autoscaling | mikePietsch.com

11.8 Cluster Autoscaler: Adding and Removing Nodes

Right, so you’ve got your pods scaling horizontally like a well-rehearsed flash mob. But what happens when the entire party runs out of room? That’s where the Cluster Autoscaler (CA) comes in. Think of it as the pragmatic bouncer for your Kubernetes nightclub. HPA and VPA handle the guest list (pods), but when the club is at capacity, the CA is the one who calls the building manager to add a new floor or, when things quiet down, tells the unused floors they can go home. It doesn’t care about CPU or memory inside your pods; it cares about whether there’s space for pods to run at all.

11.7 Combining HPA and VPA: Caveats and Best Practices

Right, so you’ve decided you want both horizontal and vertical autoscaling. Ambitious. A little greedy, even. I like it. It’s the “have your cake and eat it too” of Kubernetes resource management. But let’s be absolutely clear: combining HPA and VPA is like putting two brilliant, highly opinionated chefs in the same kitchen. If you don’t set very strict rules, they will absolutely fight over the stove, and you’ll end up with a culinary disaster (read: a cascading pod eviction nightmare).

11.6 VPA Modes: Off, Initial, Auto

Alright, let’s talk about VPA modes. This is where you decide just how much authority you’re willing to hand over to this particular robot butler. You’ve installed VPA, you’ve defined a VerticalPodAutoscaler resource, and now you have to choose its updateMode. You’ve got three options: Off, Initial, and Auto. Picking the right one is the difference between getting helpful advice and handing your cluster the keys to the kingdom with a blindfold on.

11.5 VPA: Right-Sizing Container Resource Requests

Right, so you’ve got HPA scaling the number of your pods based on traffic. That’s great. But what if the pods themselves are the problem? You’ve got a container running with a paltry 100m CPU request, but it’s constantly spiking to 800m and getting throttled into next Tuesday by the kernel. Or worse, you’ve got a memory leak slowly filling up a node because some container requested a laughably small 128Mi and is now trying to swallow 2Gi. This is where Vertical Pod Autoscaler (VPA) comes in—it’s the friend that tells you you’ve been wearing the wrong-sized clothes all along and helps you get a better fit.

11.4 HPA Behavior: Scale-Up and Scale-Down Stabilization

Alright, let’s talk about what happens after the HPA calculates it needs to scale. The raw metric says “we need 10 pods, NOW!” If we just blindly obeyed that command every polling interval, we’d be creating a chaotic mess. Pods would be frantically scaling up and down like a hyperactive yo-yo, your cluster’s control plane would weep, and your application’s performance would be a jagged nightmare of cold starts and sudden load drops. This is where behavior comes in—it’s the built-in shock absorber and common sense that prevents your cluster from having a panic attack.

11.3 Custom and External Metrics with KEDA

Right, so you’ve got HPA and VPA humming along, scaling based on CPU and memory like a well-trained golden retriever. It’s obedient, but let’s be honest, it’s not exactly clever. Your application’s real scaling triggers are probably more nuanced: the number of messages clogging your RabbitMQ queue, the throttle percentage on your third-party API, or the sheer number of users hammering your authentication service. This is where we graduate from the dog to a fox—sly, clever, and resource-aware. We do this by bringing in custom and external metrics, and the easiest, most elegant way to do that is with KEDA: the Kubernetes Event-Driven Autoscaler.

11.2 The Metrics Server: Required Infrastructure for HPA

Right, so you want to use the Horizontal Pod Autoscaler (HPA). Excellent choice. It’s basically magic, letting your application breathe in and out based on load. But here’s the thing about magic: it’s mostly just applied science, and the science here requires a specific piece of infrastructure. You can’t just wave a kubectl wand and expect it to work. You need the Metrics Server. Think of the Metrics Server as the nervous system for your cluster’s autoscaling. The kubelets on each node (the muscle) are constantly measuring resource usage—CPU and memory—of every pod. But those metrics are isolated, trapped on their individual nodes. The Metrics Server’s job is to be the brainstem: it periodically scrapes those usage stats from every kubelet, aggregates them in memory, and exposes them in a format the rest of the Kubernetes API can understand. Without it, the HPA is just a guy in a room staring at a blank teleprompter. He has no data. He can’t make decisions.

11.1 HPA: Scaling Based on CPU, Memory, and Custom Metrics

Alright, let’s talk about making your applications bend instead of break under pressure. We’re moving past the stone age of static replica counts. You don’t pay your cloud provider for a fleet of sleeping Pods, and manually scaling with kubectl scale is a party trick, not a strategy. Enter the Horizontal Pod Autoscaler (HPA), your automated, albeit occasionally dim, bartender who tops up your drinks (Pods) based on how thirsty (busy) your patrons are.