35.6 Bin Packing vs Spreading: Resource Efficiency Trade-offs

Right, let’s talk about how the Scheduler decides where to dump your Pods. You’ve probably never stared at a rack of servers and thought, “You know what this needs? A really, really good game of Tetris.” But that’s essentially the Scheduler’s full-time job. It’s constantly playing a high-stakes game of bin packing with your cluster’s nodes, trying to cram as much useful work into as few physical machines as possible. This is fantastic for your cloud bill but, as with most things in engineering, it’s a trade-off. The counter-force to this ruthless efficiency is the desire to spread your workloads out for high availability. This tension between packing and spreading is the core strategic dilemma you, as the cluster operator, get to manage.

35.5 Custom Schedulers and Scheduler Plugins

Right, so the default scheduler is pretty good at its job, but let’s be honest: it’s a generalist. It’s designed to make pretty okay decisions for most people. But your cluster isn’t “most people.” You have weird, specific needs. Maybe you need to schedule pods based on custom hardware flags, tie them to a specific internal corporate policy, or—and I’ve seen this—make sure your batch processing jobs never run on a node named after someone’s pet cat, “Mr. Whiskers.” (Don’t ask.)

35.4 Descheduler: Rebalancing Running Pods

Right, so you’ve got your cluster humming along. Pods are scheduled, your nodes are looking busy, and everything seems… fine. But fine isn’t perfect. Over time, your pristine cluster can start to look like my garage after a long weekend project: stuff ends up in weird places for reasons that made sense at the time but are utterly baffling in the cold light of day. A node might be running at 90% memory while its neighbor is practically napping. You might have evicted a pod from a spotty node, but its replacement got scheduled right back onto the same faulty machine. This is where the Descheduler comes in. Think of it not as a failure of the main scheduler, but as its janitorial crew, working the night shift to clean up the messes that inevitably accumulate during the day.

35.3 Priority and Preemption: Evicting Lower-Priority Pods

Right, so you’ve told your Pods where they can’t run with Taints and Tolerations. Now let’s talk about how you tell the scheduler which Pods should run first, and more importantly, which ones are so important they can kick others out of the way. This is Priority and Preemption, and it’s Kubernetes’ way of saying, “This request is more important than yours, and I’m not sorry about it.” Think of it like airport security. Most of us wait in the general queue (the standard scheduler flow). But if a pilot or a high-status frequent flyer rocks up, they get to jump the line (higher priority). And if the priority lane is absolutely full? Well, security might just ask a few people from the general queue to step aside to make room (preemption). It’s efficient, but it’s also brutal and can be deeply disruptive if you’re the one getting evicted.

35.2 Built-in Scheduler Plugins

Right, let’s talk about how your Pods actually get a home. The kube-scheduler isn’t some mystical oracle; it’s a highly configurable, slightly pedantic librarian who follows a very specific set of rules to find the right shelf for your book (the Pod). We call these rules its scheduling plugins. Think of the scheduling process as a two-phase filter-and-score system. First, the librarian eliminates all the shelves that are obviously wrong. Is the node out of disk? Filtered out. Does the Pod need a GPU and this node doesn’t have one? Gone. This is the Filtering phase, run by plugins like NodeResourcesFit. Then, for all the remaining, perfectly valid shelves, the librarian ranks them. “This shelf has the most free RAM, let’s give it a high score. This one has a label the Pod prefers, add a few points.” This is the Scoring phase, run by plugins like NodeResourcesBalancedAllocation. The node with the highest score wins. It’s brutally efficient.

35.1 Scheduling Pipeline: Filtering and Scoring

Alright, let’s pull back the curtain on the main event: the scheduling pipeline. This is where the rubber meets the road. The scheduler doesn’t just pick a node out of a hat; it runs every Pod candidate through a rigorous, two-phase gauntlet: Filtering (also called Predicates) and Scoring (also called Priorities). Think of it like a reality TV show. First, we eliminate all the contestants who don’t meet the basic requirements (Filtering). Then, we judge the remaining contestants on their talents to pick a winner (Scoring).

— joke —

...