5.8 Resource Requests and Limits on Pods

Right, let’s talk about money. Not your money, but the money of your cluster. In Kubernetes, money is CPU and memory. And just like in real life, if you don’t budget properly, you’re going to have a bad time. This is where requests and limits come in. They are the financial planning department for your Pods, and if you ignore them, you’re going to get a call from the bank at 3 AM.

Think of a request as your Pod’s reserved seat at the table. It’s the minimum amount of a resource (CPU or memory) that the Kubernetes scheduler guarantees for you. It uses this value to decide which node has enough space to host your Pod. A limit, on the other hand, is the absolute maximum that your Pod is allowed to consume. It’s the bouncer at the club, ready to cut your Pod off if it gets too greedy.

Here’s the kicker: you can set these independently. You can request a little but set a high limit (a potentially dangerous move), or you can request and limit to the same value (a strict budget). But you can’t just waltz in and set a limit without a request. That’s like saying “I might need up to a private jet” without telling them you’ll at least need a bicycle. The scheduler needs that request to do its job.

Defining Requests and Limits in a Pod Spec

You define this financial plan in the resources block of a container’s spec. It goes inside each container, not at the Pod level, because each process has its own appetite. Here’s what a realistic one looks like. Notice I’m using millicores (m) for CPU because whole CPUs are precious and rare, and MiB for memory because it’s the language of computers, not marketers.

apiVersion: v1
kind: Pod
metadata:
  name: my-carefully-budgeted-app
spec:
  containers:
  - name: app
    image: my-app:1.0
    resources:
      requests:
        memory: "256Mi"
        cpu: "100m"  # That's 1/10th of a CPU core.
      limits:
        memory: "512Mi"
        cpu: "500m"  # That's half a core.

The Profoundly Different Personalities of CPU and Memory

This is the most important part to internalize: Kubernetes enforces CPU and memory limits differently, and getting this wrong is a classic face-palm moment.

CPU is a Compressible Resource. If your Pod tries to use more CPU than its limit, the kernel doesn’t kill it. It just says, “Oh no you don’t,” and throttles the process. Your application will run slower, but it won’t be terminated. It’s like your CPU is being put on a diet.

Memory is an Incompressible Resource. There’s no such thing as “throttling” memory. You’re either using a page or you’re not. So, if your Pod exceeds its memory limit, the kernel panics on your behalf and kills the process with the infamous OOMKilled (Out-Of-Memory) status. This is a hard, sudden stop. It’s not the bouncer cutting you off; it’s the bouncer throwing you out the back door into a dumpster.

The Sinner’s Guide to Pitfalls and Best Practices

The “No Limits” Party: You don’t set any limits. Your Pod can consume all the CPU and memory on the node. This is fantastic until one Pod decides to mine Bitcoin and takes down every other Pod on the node with it. Don’t be this person. Your cluster operators will hate you.
The “Guaranteed” Gold Standard: For production, the most stable configuration is to set requests and limits to the exact same value. This is known as a Guaranteed QoS Class. The scheduler knows exactly what you need, and the kernel knows exactly what you get. No surprises. It makes the system predictable.
The “I Have No Memory” Mystery: You see your Pod get OOMKilled but the metrics show it was well below your limit. Why? Because your container’s limit is not the same as your process’s memory. Your application might be allocating memory the kernel doesn’t account for directly to the container (e.g., tmpfs, page cache). The limit is a ceiling, and you hit it from the inside. Always set your limit higher than your request to give yourself breathing room.
The “I Forgot the m” Disaster: See this?
```
limits:
  cpu: "1"
```
That means one whole CPU core. Now see this?
```
limits:
  cpu: "100m"
```
That means one-tenth of a core. Mistaking 1000m for 1 is a difference of a factor of ten. I’ve seen teams accidentally request 100 cores for a simple web server. The scheduler will wait forever for a node with 100 free cores to appear. It won’t. Always use millicores (m) for precision.

The bottom line is this: setting requests and limits isn’t optional busywork. It’s the core of cluster stability. It’s how you tell the system what you need and how you promise to behave. Do it well, and your apps run smoothly. Do it poorly, and you’re just another chaotic force for the platform team to contain. Be a good citizen. Set your limits.