19.3 CPU Throttling vs Memory OOMKill

Alright, let’s get into the real-world consequences of getting your resource requests and limits wrong. This is where the rubber meets the road, or more accurately, where your application grinds to a halt or gets unceremoniously murdered. The key thing to remember is that the Kubernetes scheduler treats CPU and memory completely differently. Understanding this distinction is the difference between a smoothly running cluster and a 3 AM pager duty call that ruins your weekend.

Think of it this way: CPU is a “compressible” resource. If your app starts asking for more CPU than its limit, the kernel can basically say, “Nice try, buddy,” and slow it down. This is throttling. Memory, on the other hand, is “incompressible.” There’s no way to give a process “a little bit” of memory. It either gets what it asks for or it doesn’t. If it doesn’t, the kernel’s only recourse is to panic and kill the process. This is the dreaded OOMKill.

CPU Throttling: The Slow Squeeze

When you set a CPU limit, you’re not reserving a dedicated core for your pod. You’re setting a cap on how much time it can spend on the CPU over a window of time (100ms by default). This is enforced by the Linux kernel’s cpu cgroup controller.

Let’s say you set a limit of 500m (half a core). Your container is allowed to use 500 milliseconds of CPU time every 100ms window. If it tries to use more, the kernel puts its threads to sleep for the remainder of the window. Your application doesn’t get an error; it just gets… slow. Painfully, inexplicably slow. This is often the silent killer of performance.

You can actually see this happening. Let’s create a pod with a stupidly low limit that’s guaranteed to throttle.

apiVersion: v1
kind: Pod
metadata:
  name: throttling-demo
spec:
  containers:
  - name: stress
    image: polinux/stress
    resources:
      requests:
        cpu: "100m"
      limits:
        cpu: "100m"
    command: ["stress"]
    args: ["--cpu", "2", "--timeout", "60s"]

This stress command tries to use 2 full CPUs, but we’ve limited it to 100m. It’s going to get throttled into oblivion. To confirm, exec into the pod and check the throttling metrics from the kernel:

# After a few seconds, check the pod's CPU usage
kubectl top pod throttling-demo

# Now, exec in and check the cpu.stat file for the cgroup
kubectl exec -it throttling-demo -- sh
# Inside the container:
cat /sys/fs/cgroup/cpu/cpu.stat

You’ll see lines like nr_throttled (number of times it was throttled) and throttled_time (total nanoseconds it was throttled). The numbers will be horrifyingly large. This is why you might see high latency in your web app even though your CPU usage looks low. It’s begging for CPU time and being told to wait.

Memory Limits and the OOMKill: The Sudden Stop

Now for the more dramatic sibling. Memory doesn’t throttle. You set a limit of 256Mi, and the moment your process steps over that line, the Linux memory cgroup controller triggers the Out Of Memory (OOM) killer. Its job is to sacrifice a process to save the whole node. Your container is that process.

The kill is instant, brutal, and often leaves a confusing trail. The pod doesn’t enter Error state; it enters CrashLoopBackOff because it keeps getting killed and restarted. This is why you might see a pod that was running fine for days suddenly start crashing—maybe its memory usage slowly grew until it hit the limit.

Here’s a pod destined for a quick death:

apiVersion: v1
kind: Pod
metadata:
  name: oom-demo
spec:
  containers:
  - name: memory-hog
    image: polinux/stress
    resources:
      requests:
        memory: "50Mi"
      limits:
        memory: "100Mi"
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "150M", "--vm-hang", "1"]

This tells stress to try and allocate 150Mi of memory. We’ve given it a limit of 100Mi. It’s doomed.

The best way to investigate an OOMKill is to check the pod’s status and describe it:

kubectl describe pod oom-demo

Look for the Last State section. You’ll see something beautifully direct:

Last State:     Terminated
Reason:       OOMKilled
Exit Code:    137

Exit code 137 (128 + 9, where 9 is the SIGKILL signal) is the tell-tale sign. The kernel didn’t ask nicely; it just ended things.

Why This Asymmetry Exists and What to Do About It

The designers didn’t choose this to be difficult. It’s a fundamental property of the resources. You can time-slice a CPU core. You can’t time-slice a chunk of RAM.

This leads to my biggest piece of advice: Set CPU limits carefully, and set memory limits absolutely.

For Memory: Your memory limit is a hard boundary. You must set it to prevent a buggy pod from consuming all memory on a node and causing a cascading failure. Use it as a safety net. Size it based on the maximum amount of memory your application can use without failing, plus a small buffer. Monitor your actual usage and adjust.
For CPU: The value of CPU limits is hotly debated. Throttling can be devastating for latency-sensitive applications. Many experts recommend omitting CPU limits entirely in production to avoid throttling, relying instead on generous requests to help the scheduler. The risk is that a “noisy neighbor” pod could consume unused CPU on a node. You need to decide which evil you prefer: potential throttling or potential node saturation. My default stance is to start without limits, monitor carefully, and only add them if you see a specific problem you need to solve.

Always set requests for both. They’re your promise to the scheduler. Limits are your contract with the kernel, and the kernel enforces its contracts with extreme prejudice. Choose your terms wisely.