19.1 Requests: What the Scheduler Uses for Placement

Alright, let’s talk about the one thing that actually matters to the scheduler when it’s trying to find a home for your pod: requests. Forget limits for a moment; they’re the bouncer at the club, but requests are the guest list. The scheduler only cares about the guest list.

When you define a resources.requests block in your container spec, you’re not making a polite suggestion. You’re declaring, under oath, “This container will need at least this much CPU and memory to function properly.” The scheduler takes this sworn testimony and uses it to find a node with enough spare capacity to honor your request. It’s a contract. If the node can’t fulfill it, your pod ain’t getting scheduled.

A CPU request isn’t about dedicating a whole core to your container (unless you request a whole core, of course). It’s about guaranteed time. Think of it like this: the CPU is a pie, and a request of 500m (500 millicores, or 0.5 cores) is you saying, “I must have at least half a pie’s worth of slices at every serving.”

This is measured in CPU time. The Linux kernel’s CFS (Completely Fair Scheduler) ensures your container gets its fair share of cycles. If you ask for 500m, you are guaranteed to get half a core’s worth of processing power, on average, over time. If other containers on the node are idle, you can absolutely use more—this is the “nice” part of Kubernetes. But when things get busy, the kernel will throttle other containers to make sure you get your contracted amount.

Here’s the spec. Note the requests block is separate from limits; we’ll get to that later.

apiVersion: v1
kind: Pod
metadata:
  name: important-app
spec:
  containers:
  - name: app
    image: important-app:1.0
    resources:
      requests:
        cpu: "500m"   # I need at least half a core's worth of time
        memory: "256Mi" # I need at least 256 Mebibytes of RAM

The Memory Request: A Hard Reservation

Memory is a crueler, simpler world. There’s no “sharing” or “bursting” like with CPU. When you request memory, the scheduler finds a node with that much RAM available, and then the Kubelet essentially locks it away for your container.

Why? Because unlike CPU time, which can be preempted and shared, memory is a physical resource your process allocates and holds onto. If your container tries to allocate more memory than its request, it can, but the moment the node runs out of physical memory, the OOM (Out-Of-Memory) Killer gets involved. And let me tell you, the OOM Killer is a chaotic neutral entity that doesn’t care about your pod’s importance. It starts shooting processes until the system is stable again. Your container is a prime target if it’s using more than it requested.

So, if you request 256Mi, that’s your reserved parking space. You can park a bigger truck there (use more), but if the parking lot fills up, your truck is getting towed first.

What Happens If You Don’t Ask For Anything?

You get nothing. Literally. You’ve declared yourself an optional citizen in the cluster.

If you omit resources.requests (or set them to 0), the scheduler has no idea what your pod needs. It might place your pod on a utterly overwhelmed node, or it might place it on a empty one. It’s a complete gamble. Worse, your pod will have the lowest possible QoS class (more on that in the next section), making it first in line for eviction when the node gets resource pressure.

This is the most common beginner mistake. Don’t be that person. Always set requests. Your kubectl top pod output will be a mess otherwise, and your cluster stability will be a joke.

How the Scheduler Actually Uses This Info

The scheduler isn’t doing anything magic. It keeps a running tally of the allocatable capacity on each node (the total machine resources minus what’s reserved for the system and Kubernetes itself). Then it subtracts the sum of all requests for all pods already running on that node.

The result is the node’s allocatable capacity. When your new pod comes along, the scheduler simply checks: “Does any node have allocatable CPU >= my pod’s total CPU request and allocatable memory >= my pod’s total memory request?” If yes, it’s a candidate. If no, the pod sits in Pending hell until a node has room.

It’s a simple bin-packing algorithm. It doesn’t care about actual usage reported by kubectl top; it only cares about the reserved resources based on your requests. This is why you can have a node at 90% CPU utilization but the scheduler will still happily deploy a new pod there if the sum of all requests is low—a phenomenon that regularly causes operators to question their life choices. The system assumes you meant what you said in your request contract.

The CPU Request: Your Fair Share of Time

The Memory Request: A Hard Reservation

What Happens If You Don’t Ask For Anything?

How the Scheduler Actually Uses This Info