19.2 Limits: The Hard Cap on Resource Consumption

Alright, let’s talk about limits. If requests are your polite, “hey, maybe could I have some more?” note to the kitchen, then limits are the bouncer at the club door. They don’t negotiate. They don’t care if your process is having the best day of its life. They just enforce the hard rule: “Thou shalt not consume more than X.”

The kernel enforces this with brutal efficiency. A process hits its memory limit (memory)? SIGKILL. Not SIGTERM. Not a gentle warning. It’s oom-killed, gone, vanished from the process table. It hits its CPU limit (cpu)? The kernel’s CPU throttler (CFS – Completely Fair Scheduler, and yes, the irony is rich) ensures the process gets precisely zero cycles beyond its limit. It’s not killed, but it’s effectively frozen in time until the next measurement window. It’s a hard cap. This is why you set them.

Why You Absolutely Need Limits

Think of a cluster without limits. It’s the wild west. One badly behaved pod (let’s be honest, it’s always the one you inherited from that team that “moved fast and broke things”) can start chewing through memory like it’s at an all-you-can-eat buffer. This triggers the Linux kernel’s Out-of-Memory (OOM) Killer on the node. The OOM Killer is a panicking, flailing entity that starts terminating processes to save the host machine. It might kill your misbehaving app, or it might kill a perfectly healthy system pod, or your database. It’s a chaotic, non-deterministic nightmare. Limits are your way of telling the OOM Killer, “Hey, if this thing right here gets out of hand, just kill it and leave everyone else alone.” You’re sacrificing a pawn to save the king.

The Anatomy of a CPU Limit

This is where things get weird, so pay attention. A CPU limit of 1 does not mean one whole, dedicated CPU core. I know, it’s dumb. It means 1 CPU core’s worth of time, measured across the system’s scheduling windows.

The unit is millicores. A cpu: "1" is 1000m. You can define it as cpu: "1" or cpu: "1000m" – they’re identical. A limit of cpu: "500m" means your container gets to use half of a CPU core’s time every second.

Here’s the kicker: because this is enforced by kernel throttling, a process that hits its CPU limit doesn’t get a clean “slow down” signal. It gets abruptly throttled, which can feel like a sudden stall. This can absolutely murder the performance of latency-sensitive applications like Java or Go services that expect consistent scheduling. The latency spikes can be insane. It’s one of the biggest “questionable choices” in this whole system – the implementation is effective but brutally janky.

apiVersion: v1
kind: Pod
metadata:
  name: limited-pod
spec:
  containers:
  - name: app
    image: my-app:latest
    resources:
      limits:
        cpu: "500m"    # Half a core's worth of time
        memory: "256Mi" # 256 Mebibytes
      requests:
        cpu: "250m"
        memory: "128Mi"

The Anatomy of a Memory Limit

This one is more straightforward, but the consequences are more severe. Memory is measured in bytes, and the common suffixes are Mi (Mebibytes, 2^20) and Gi (Gibibytes, 2^30). Use these, not MB/GB, to avoid ambiguity.

When the container’s resident memory (RSS) usage exceeds the limit, the kernel’s OOM Killer gets a target painted on the process’s back. The container_memory_working_set_bytes metric in cAdvisor is what you should alert on, as this is what the kernel uses for its kill decision. When it crosses the limit, BAM – the process is dead. You’ll see OOMKilled in kubectl describe pod.

kubectl describe pod limited-pod
...
Containers:
  app:
    ...
    State:          Terminated
      Reason:       OOMKilled
      Exit Code:    137

Exit code 137 (128 + 9, where 9 is the signal number for SIGKILL) is your tell-tale sign that the memory bouncer did its job.

Best Practices and Pitfalls

Never set a limit without a request. This is Kubernetes 101. If you don’t set a request, the scheduler behaves as if the request is equal to the limit. Your pod with a memory: 4Gi limit and no request will demand 4Gi of free memory to be scheduled, which is probably not what you want. Always set both.
CPU Limits are Often Evil. For many modern, latency-sensitive applications (Go, Java, Node, etc.), the jitter introduced by CPU throttling can cause more problems than it solves. The best practice is shifting towards omitting CPU limits entirely and instead using CPU Requests to ensure fair scheduling and relying on other tools (like Horizontal Pod Autoscaler based on CPU utilization) for scaling. This is a controversial but increasingly common stance. For memory, you absolutely must keep limits.
Your Application Doesn’t Know. Your app has no innate idea what its Kubernetes limits are. The JVM, for example, will see the host machine’s memory and naively set its heap based on that. You must configure your runtime to respect the limits you set. For a Java app, you’d use -Xmx flags or the JAVA_TOOL_OPTIONS env var to ensure the heap stays well within the container’s memory limit, leaving room for off-heap memory.
The QoS Class Trade-off. Remember the Guaranteed QoS class from the last section? It requires limits.cpu == requests.cpu and limits.memory == requests.memory. This is the only way to get that top-tier class. It’s great for stability, but it removes your ability to overcommit resources on the node. It’s a trade-off between predictability and density. Choose wisely.