Scheduling | mikePietsch.com

35.6 Bin Packing vs Spreading: Resource Efficiency Trade-offs

Right, let’s talk about how the Scheduler decides where to dump your Pods. You’ve probably never stared at a rack of servers and thought, “You know what this needs? A really, really good game of Tetris.” But that’s essentially the Scheduler’s full-time job. It’s constantly playing a high-stakes game of bin packing with your cluster’s nodes, trying to cram as much useful work into as few physical machines as possible. This is fantastic for your cloud bill but, as with most things in engineering, it’s a trade-off. The counter-force to this ruthless efficiency is the desire to spread your workloads out for high availability. This tension between packing and spreading is the core strategic dilemma you, as the cluster operator, get to manage.

35.5 Custom Schedulers and Scheduler Plugins

Right, so the default scheduler is pretty good at its job, but let’s be honest: it’s a generalist. It’s designed to make pretty okay decisions for most people. But your cluster isn’t “most people.” You have weird, specific needs. Maybe you need to schedule pods based on custom hardware flags, tie them to a specific internal corporate policy, or—and I’ve seen this—make sure your batch processing jobs never run on a node named after someone’s pet cat, “Mr. Whiskers.” (Don’t ask.)

35.4 Descheduler: Rebalancing Running Pods

Right, so you’ve got your cluster humming along. Pods are scheduled, your nodes are looking busy, and everything seems… fine. But fine isn’t perfect. Over time, your pristine cluster can start to look like my garage after a long weekend project: stuff ends up in weird places for reasons that made sense at the time but are utterly baffling in the cold light of day. A node might be running at 90% memory while its neighbor is practically napping. You might have evicted a pod from a spotty node, but its replacement got scheduled right back onto the same faulty machine. This is where the Descheduler comes in. Think of it not as a failure of the main scheduler, but as its janitorial crew, working the night shift to clean up the messes that inevitably accumulate during the day.

35.3 Priority and Preemption: Evicting Lower-Priority Pods

Right, so you’ve told your Pods where they can’t run with Taints and Tolerations. Now let’s talk about how you tell the scheduler which Pods should run first, and more importantly, which ones are so important they can kick others out of the way. This is Priority and Preemption, and it’s Kubernetes’ way of saying, “This request is more important than yours, and I’m not sorry about it.” Think of it like airport security. Most of us wait in the general queue (the standard scheduler flow). But if a pilot or a high-status frequent flyer rocks up, they get to jump the line (higher priority). And if the priority lane is absolutely full? Well, security might just ask a few people from the general queue to step aside to make room (preemption). It’s efficient, but it’s also brutal and can be deeply disruptive if you’re the one getting evicted.

35.2 Built-in Scheduler Plugins

Right, let’s talk about how your Pods actually get a home. The kube-scheduler isn’t some mystical oracle; it’s a highly configurable, slightly pedantic librarian who follows a very specific set of rules to find the right shelf for your book (the Pod). We call these rules its scheduling plugins. Think of the scheduling process as a two-phase filter-and-score system. First, the librarian eliminates all the shelves that are obviously wrong. Is the node out of disk? Filtered out. Does the Pod need a GPU and this node doesn’t have one? Gone. This is the Filtering phase, run by plugins like NodeResourcesFit. Then, for all the remaining, perfectly valid shelves, the librarian ranks them. “This shelf has the most free RAM, let’s give it a high score. This one has a label the Pod prefers, add a few points.” This is the Scoring phase, run by plugins like NodeResourcesBalancedAllocation. The node with the highest score wins. It’s brutally efficient.

35.1 Scheduling Pipeline: Filtering and Scoring

Alright, let’s pull back the curtain on the main event: the scheduling pipeline. This is where the rubber meets the road. The scheduler doesn’t just pick a node out of a hat; it runs every Pod candidate through a rigorous, two-phase gauntlet: Filtering (also called Predicates) and Scoring (also called Priorities). Think of it like a reality TV show. First, we eliminate all the contestants who don’t meet the basic requirements (Filtering). Then, we judge the remaining contestants on their talents to pick a winner (Scoring).

35. The Kubernetes Scheduler: How Pods Get Placed

34.7 Cluster Autoscaler Integration with Taints and Labels

Right, so you’ve got your nodes tainted up like a contaminated crime scene and your pods are politely tolerating it. It’s a beautiful, orderly system. But then you remember: the cluster isn’t a static painting; it’s a living, breathing thing that needs to scale. This is where the Cluster Autoscaler (CA) waltzes in, looks at your meticulously crafted rules, and says, “Cool story, bro. Now let me figure out where to put this new node.”

34.6 Topology Spread Constraints: Balanced Pod Distribution

Right, so you’ve got your Pods running, but you’ve looked at your cluster and noticed something absurd: all your web-server Pods have huddled onto the same two nodes like they’re sharing a single brain cell. The nodes hosting your stateful database? Completely empty. This isn’t just inefficient; it’s a ticking time bomb. If one of those crowded nodes goes down, your entire service might follow. This is where the scheduler’s smarter, more meticulous cousin comes in: Topology Spread Constraints.

34.5 Tolerations: Allowing Pods onto Tainted Nodes

Right, so you’ve tainted your nodes. Good for you. You’ve drawn a neat little “Keep Out” sign on a subset of your cluster, probably for a good reason. Maybe those nodes have expensive GPUs you don’t want wasted on a nginx pod, or they’re in a dodgy availability zone you’re trying to drain. But here’s the catch: what if you do want a pod to break the rules? What if you have a special, privileged workload that needs to run on that tainted hardware?

34.4 Taints: Marking Nodes as Unsuitable for Certain Pods

Right, so you’ve got your Pods happily landing on any old node that has free space. Cute. But in the real world, some nodes are special. Maybe they’re expensive GPU machines, or they’re reserved for a critical database, or they’re a bit flaky and you only want test workloads on them. You don’t want just any Pod scheduling on them. This is where taints come in. Think of a taint as a big, angry “KEEP OFF MY LAWN” sign posted on a node. It has three parts: a key, a value, and an effect. The effect is the most important part—it tells the scheduler what to do when a Pod shows up without an invitation. A Pod gets an invitation in the form of a toleration, which is basically a note that says, “Yeah, I see your ‘LAWN’ sign, but it’s cool, I’m with the band.”

34.3 Pod Affinity and Anti-Affinity: Co-Locating and Spreading Pods

Right, so you’ve told your pods where they can run with node selectors and affinities. But what about telling them who they should run with? Or, more importantly, who they should avoid? That’s where pod affinity and anti-affinity come in, and they’re the divas of the scheduling world. They don’t just care about the node itself; they care about the other pods already throwing a party on it. Think of it like this: node affinity is about the hardware (“I need a GPU”). Pod affinity is about the neighbors (“I need to be next to my database for low latency,” or, conversely, “For the love of all that is holy, do not put me on the same node as that memory-hogging cache service”).

34.2 Node Affinity: requiredDuringScheduling and preferredDuringScheduling

Right, so you’ve told your Pod where it can’t go with taints. Now let’s talk about the more polite, proactive side of the equation: node affinity. This is how you tell your Pod where it should go, or at least, where it would prefer to go. It’s the difference between “Get off my lawn!” (taints) and “Hey, you’d love it here, we have a pool!” (affinity). The designers, in their infinite wisdom, gave us two main flavors of node affinity: requiredDuringScheduling and preferredDuringScheduling. The names are a mouthful, but they’re brutally honest about what they do. The first one is a hard requirement. If Kubernetes can’t meet it, your Pod sulks in a Pending state forever. The second is a soft preference, a suggestion. Kubernetes will try its best, but if it can’t find a node that matches, it will just shrug and schedule your Pod somewhere else. It’s the difference between “I will only eat pizza from this one specific joint in Naples” and “I’d prefer pizza, but I guess this salad will do.”

34.1 nodeSelector: Simple Label-Based Node Selection

Right, let’s talk about the simplest way to tell your Pod, “Hey, don’t just land anywhere.” That’s nodeSelector. It’s the Kubernetes equivalent of pointing at a specific table in a crowded cafeteria and saying, “Sit there.” It’s not subtle, but it gets the job done with minimal fuss. The core concept is embarrassingly simple: you slap a label on your nodes, and then you tell your Pod to only run on nodes that have that exact label. It’s a hard requirement. No label match? No schedule for you. The scheduler isn’t going to negotiate or find a compromise; it’s a binary check.

34. Node Selection, Affinity, Taints, and Tolerations

20.7 systemd Timers as a cron Alternative

Alright, let’s talk about the elephant in the room: cron is old. It’s the venerable, grumpy grandparent of task scheduling. It works, it’s everywhere, but it has some deeply weird habits, like emailing you a letter every time it takes out the trash. For modern Linux systems, there’s a new sheriff in town, and it’s wearing the same uniform as everything else: systemd. Yes, systemd absorbed this too. Love it or hate it, its timer system is incredibly powerful and integrated. Instead of the scattered, edit-in-isolation approach of cron, systemd timers are managed like any other service—with consistent logging, dependency handling, and a unified control interface. It’s the difference between a standalone appliance and one that’s wired into your smart home.

20.6 atq and atrm: Viewing and Removing Pending Jobs

Right, so you’ve fired a job into the future with at. Now what? You’re not a fortune teller; you can’t just hope it’ll run. You need to see what’s in the queue, and sometimes, you need to perform a little tactical retreat on a job you just scheduled. That’s where atq and atrm come in. Think of them as your mission control for the at system. Checking the Queue with atq The atq command is dead simple. It simply lists the jobs currently pending in the at queue. The name stands for “at queue,” which is refreshingly logical for a Unix command.

20.5 at: One-Time Scheduled Jobs

Alright, let’s talk about at. If cron is your meticulous, obsessive calendar for tasks that happen over and over, at is its free-spirited, slightly scatterbrained cousin who you call to do one thing, one time, in the future. “Hey, reboot the server at 2 AM,” or “Download that huge file at 3 PM when the network’s quiet.” It’s a brilliantly simple tool that, frankly, doesn’t get enough love. The core concept is brain-dead simple: you tell at when to run a command, then you feed it what to run. It’s not a daemon that’s always running like crond; it’s a utility that schedules a job with the atd daemon, which then forks off a child process to run your command at the appointed time. It’s a fire-and-forget missile for your command line.

20.4 /etc/crontab and the System Crontab Format

Alright, let’s get our hands dirty with the system’s master to-do list: /etc/crontab. This isn’t your user crontab (crontab -e); this is the big leagues, the one that runs as root and handles system-wide jobs. Think of it as the difference between a sticky note on your monitor and an official company memo. It’s a file, sitting right there in /etc, and you edit it with a text editor (like vim or nano) using sudo because, well, you’d better have a good reason to touch it.

20.3 /etc/cron.d, cron.daily, cron.weekly, cron.monthly

Now, if you’ve been following along, you’ve got the basics of crontab -e down. You can make a single script run at 3:17 AM on the second Tuesday of every month that emails you a picture of a cat. Wonderful. But what about when you graduate from being a user who schedules tasks to being the system administrator who has to manage them? Or when you need to install a package that needs its own scheduled job? You don’t want that package mucking about in your personal crontab, and you certainly don’t want to edit its crontab as root. Enter the system’s scheduling directory: /etc/cron.d.

20.2 crontab -e, -l, -r: Managing User Crontabs

Alright, let’s get our hands dirty with the actual management of your crontab. This is where you, the user, get to tell cron what to do and when. Forget about poking around in /etc/ directories for a moment; your personal crontab is your own sandbox, and the crontab command is your shovel. The key thing to engrave into your brain right now: You do NOT edit the crontab file directly. I know, I know, your text editor is begging for action. Resist. The system keeps a master database of user crontabs, usually in /var/spool/cron/ or /usr/lib/cron/tabs/, but touching those files manually is a one-way ticket to permission-denied town and potential disaster. The crontab command is the only sanctioned, safe way to interact with your scheduled tasks. It handles the locking, the syntax checking, and the installation for you. Use it.

20.1 Crontab Syntax: Minute, Hour, Day, Month, Weekday

Alright, let’s get our hands dirty with crontab syntax. This is where the magic—and the absolute head-scratching frustration—happens. Forget the pretty GUIs; this is the real control panel. A crontab is simply the file where you define your schedule of jobs (or ‘cron jobs’) for the cron daemon to execute. Each user on a system can have their own crontab, and there’s also a system-wide one (usually /etc/crontab or in /etc/cron.d/).