Scheduler | mikePietsch.com

14.6 The Main Goroutine and Program Termination

Right, let’s talk about the one goroutine you’ve been using all along without even knowing it: the main goroutine. It’s the VIP of your program, the first one on the scene, and frankly, a bit of a diva. When it decides to leave the party, the whole club shuts down immediately, regardless of how many other goroutines are still dancing on the tables. Think of your main() function as less of a function and more as a concert stage. When the program starts, the runtime sets up the stage and the main goroutine, our headliner, walks out and starts performing the code you wrote. This is its one and only job. It doesn’t get a special backstage pass or a different type of scheduler—it’s a goroutine like any other, just the first one.

14.5 Goroutine Leaks and How to Prevent Them

Right, let’s talk about goroutine leaks. This is where the magic of “just fire off a goroutine for everything!” starts to feel less like a superpower and more like you’ve accidentally hired an intern who never, ever goes home. They just keep stacking pizza boxes in the corner of the breakroom, muttering about channels. A goroutine leak happens when you start a goroutine that is supposed* to terminate at some point, but due to a logic error, it never does. It becomes the undead of your concurrency model: shambling around, consuming resources, and waiting for a signal to rest that never comes.

14.4 Goroutine Stacks: Starting Small and Growing

Right, let’s talk about where your goroutines actually live. You don’t just summon them from the aether; they need a place to store their local variables, their function arguments, their return addresses—all the little bits of state that make them, well, them. That place is the stack. Now, if you’re coming from the world of OS threads, you’re probably used to the idea of a big, fat, pre-allocated stack for each thread. The kernel typically reserves a megabyte or two (and you can often tweak this). It’s like giving every employee a massive, empty warehouse to work in from day one. Safe? Sure. A colossal waste of memory if you have ten thousand employees mostly just sorting paperclips? Absolutely.

14.3 The Go Runtime Scheduler: GOMAXPROCS and Work Stealing

Right, let’s talk about the unsung hero that makes your goroutines actually run without setting your CPU on fire: the Go runtime scheduler. You fire off a million go keywords and just expect it to work, and miraculously, it mostly does. This isn’t magic; it’s a brilliantly engineered piece of software that deserves a moment of your attention. Think of it this way: your OS scheduler juggles heavyweight threads, which is like trying to manage a construction crew. Context switching is expensive; it involves swapping out huge amounts of memory and CPU state. Now imagine you need to manage a million tiny, independent tasks. Hiring a million OS threads for that is a recipe for your kernel having a panic attack. Go’s solution is to have its own user-space scheduler that multiplexes your potentially millions of goroutines onto a small number of OS threads. It’s the difference between managing that construction crew and managing an army of highly efficient ants. The OS sees a few threads; the Go runtime sees your entire universe of concurrent work.

14.2 Goroutines vs OS Threads: The M:N Scheduler

Right, let’s talk about the magic trick. You’ve probably heard that goroutines are “lightweight threads,” but that’s like calling a Ferrari “a car with good gas mileage”—it misses the point entirely. The real wizardry isn’t the goroutine itself; it’s the runtime scheduler that makes them so absurdly efficient. We’re not just mapping one execution thing to another; we’re playing a game of 3D chess between your code, logical goroutines, and OS threads.

14.1 Starting a Goroutine: go func()

Right, so you’ve heard the hype. “Concurrency made easy!” “It’s like threads but they’re lightweight!” And for once, the hype is mostly right. But let’s be clear: easy doesn’t mean magic. You still have to know what you’re doing, or you’ll build a spectacularly concurrent system that does absolutely nothing correctly. The absolute bedrock of concurrency in Go is the goroutine. Think of it as the smallest unit of work that the Go scheduler can manage. The syntax for starting one is so stupidly simple it feels like you’re getting away with something. You just prefix a function call with the keyword go, and boom, you’re off to the races. The function you call then runs concurrently alongside the rest of your code.

14. Goroutines: Lightweight Concurrency

35.6 Bin Packing vs Spreading: Resource Efficiency Trade-offs

Right, let’s talk about how the Scheduler decides where to dump your Pods. You’ve probably never stared at a rack of servers and thought, “You know what this needs? A really, really good game of Tetris.” But that’s essentially the Scheduler’s full-time job. It’s constantly playing a high-stakes game of bin packing with your cluster’s nodes, trying to cram as much useful work into as few physical machines as possible. This is fantastic for your cloud bill but, as with most things in engineering, it’s a trade-off. The counter-force to this ruthless efficiency is the desire to spread your workloads out for high availability. This tension between packing and spreading is the core strategic dilemma you, as the cluster operator, get to manage.

35.5 Custom Schedulers and Scheduler Plugins

Right, so the default scheduler is pretty good at its job, but let’s be honest: it’s a generalist. It’s designed to make pretty okay decisions for most people. But your cluster isn’t “most people.” You have weird, specific needs. Maybe you need to schedule pods based on custom hardware flags, tie them to a specific internal corporate policy, or—and I’ve seen this—make sure your batch processing jobs never run on a node named after someone’s pet cat, “Mr. Whiskers.” (Don’t ask.)

35.4 Descheduler: Rebalancing Running Pods

Right, so you’ve got your cluster humming along. Pods are scheduled, your nodes are looking busy, and everything seems… fine. But fine isn’t perfect. Over time, your pristine cluster can start to look like my garage after a long weekend project: stuff ends up in weird places for reasons that made sense at the time but are utterly baffling in the cold light of day. A node might be running at 90% memory while its neighbor is practically napping. You might have evicted a pod from a spotty node, but its replacement got scheduled right back onto the same faulty machine. This is where the Descheduler comes in. Think of it not as a failure of the main scheduler, but as its janitorial crew, working the night shift to clean up the messes that inevitably accumulate during the day.

35.3 Priority and Preemption: Evicting Lower-Priority Pods

Right, so you’ve told your Pods where they can’t run with Taints and Tolerations. Now let’s talk about how you tell the scheduler which Pods should run first, and more importantly, which ones are so important they can kick others out of the way. This is Priority and Preemption, and it’s Kubernetes’ way of saying, “This request is more important than yours, and I’m not sorry about it.” Think of it like airport security. Most of us wait in the general queue (the standard scheduler flow). But if a pilot or a high-status frequent flyer rocks up, they get to jump the line (higher priority). And if the priority lane is absolutely full? Well, security might just ask a few people from the general queue to step aside to make room (preemption). It’s efficient, but it’s also brutal and can be deeply disruptive if you’re the one getting evicted.

35.2 Built-in Scheduler Plugins

Right, let’s talk about how your Pods actually get a home. The kube-scheduler isn’t some mystical oracle; it’s a highly configurable, slightly pedantic librarian who follows a very specific set of rules to find the right shelf for your book (the Pod). We call these rules its scheduling plugins. Think of the scheduling process as a two-phase filter-and-score system. First, the librarian eliminates all the shelves that are obviously wrong. Is the node out of disk? Filtered out. Does the Pod need a GPU and this node doesn’t have one? Gone. This is the Filtering phase, run by plugins like NodeResourcesFit. Then, for all the remaining, perfectly valid shelves, the librarian ranks them. “This shelf has the most free RAM, let’s give it a high score. This one has a label the Pod prefers, add a few points.” This is the Scoring phase, run by plugins like NodeResourcesBalancedAllocation. The node with the highest score wins. It’s brutally efficient.

35.1 Scheduling Pipeline: Filtering and Scoring

Alright, let’s pull back the curtain on the main event: the scheduling pipeline. This is where the rubber meets the road. The scheduler doesn’t just pick a node out of a hat; it runs every Pod candidate through a rigorous, two-phase gauntlet: Filtering (also called Predicates) and Scoring (also called Priorities). Think of it like a reality TV show. First, we eliminate all the contestants who don’t meet the basic requirements (Filtering). Then, we judge the remaining contestants on their talents to pick a winner (Scoring).