19.6 ResourceQuota: Namespace-Level Resource Caps

Right, let’s talk about ResourceQuotas. This is where the fun begins, or ends, depending on whether you’re the one setting them or the one hitting them. Think of a ResourceQuota as the stern, spreadsheet-loving parent of a Kubernetes namespace. It doesn’t micromanage how each pod behaves (that’s the LimitRange’s job, which we’ll get to), but it absolutely keeps a running tally of the total resource consumption for all pods in its domain. The moment the namespace tries to exceed its allowance, API server says “nope” and your new pod sits in a sad, Pending state. It’s the ultimate “you’ve had enough” mechanism.

Why You Absolutely Need This

Without ResourceQuotas, your cluster is a free-for-all. It’s the wild west. One team, in a fit of enthusiasm (or incompetence), can spin up a hundred pods each requesting 4 CPUs and bring the entire cluster to its knees for everyone else. It’s a classic “noisy neighbor” problem. Quotas solve this by enforcing hard limits per namespace, which is how you achieve multi-tenancy without bloodshed. They’re not just a good practice for large clusters; even in a small cluster, they enforce discipline and prevent a simple typo in a kubectl apply from becoming an all-hands-on-deck emergency.

The Anatomy of a ResourceQuota

A ResourceQuota is defined per namespace and can control two main categories of things: compute resources (like CPU and memory) and object counts (like how many Services or PVCs you can have). Yes, you can quota the number of objects. This is brilliant because sometimes the API server gets overwhelmed not by CPU but by sheer quantity of objects to manage.

Here’s what a typical, moderately paranoid quota looks like:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: not-so-generous-quota
  namespace: my-app-namespace
spec:
  hard:
    # Compute Resources
    requests.cpu: "2"
    requests.memory: 4Gi
    limits.cpu: "4"
    limits.memory: 8Gi
    # Object Counts (because chaos is not a best practice)
    services: 10
    persistentvolumeclaims: 5
    secrets: 20
    configmaps: 20

Apply this with kubectl apply -f quota.yaml -n my-app-namespace, and the walls go up.

Requests vs. Limits: The Quota Distinction

This is critical and trips everyone up. The quota system tracks requests and limits separately. Look at the example above: I’ve set a cap of 2 CPUs for the sum of all requests.cpu in the namespace, and a cap of 4 CPUs for the sum of all limits.cpu.

Why the distinction? Because requests are what the scheduler uses to find a node with enough allocatable resources. This quota is effectively capping the guaranteed resource consumption of the namespace. The limits quota, on the other hand, caps the maximum possible consumption, which is what the kernel enforces on the node. You need to set both. A namespace could hit its requests.cpu quota but still have a lower limits.cpu quota, meaning its pods can’t burst as much as they’d like. Plan these numbers carefully.

The Pitfalls and How to Avoid Them

The most common pitfall is the dreaded “forgot-to-request-resources” pod. You deploy a pod without any requests or limits defined in its spec. What happens? The Kubernetes API server sees this pod and says, “This pod is requesting 0 CPU and 0 memory. That fits within the quota!” and allows it. The scheduler then comes along, sees the same thing, and says, “Well, this pod requests nothing, so I can put it anywhere!” and does so. The kubelet then starts the container, which immediately begins consuming as much CPU and memory as it wants. You’ve just circumvented the entire quota system.

The solution is two-fold:

Use a LimitRange: In the same namespace as your quota, define a LimitRange that sets default requests and limits. This catches those “naked” pods and slaps sensible defaults on them, which will then be counted against the quota.
Admission Controllers: Use tools like OPA/Gatekeeper or Kyverno to outright reject any pod that doesn’t have resource requests defined. This is the “be ruthless” approach.

Another gotcha is the “what counts?” problem. Quotas count resources against a pod the moment it is created. If you delete a pod, its resources are freed from the quota immediately. However, other objects, like PersistentVolumeClaims, are counted for their entire lifetime. Deleting a pod that used a PVC doesn’t free up the PVC count; you have to delete the PVC itself.

Best Practices from the Trenches

Quotas and Limits are a Package Deal: Never create a ResourceQuota without also creating a LimitRange in that namespace. The LimitRange defines the rules for individual pods, and the ResourceQuota defines the budget for the entire namespace. They are two halves of the same system.
Start Conservative, Then Relax: It’s far less painful to grant a team more quota after they’ve proven they need it than it is to take quota away from a team that’s already using it. Start with tight limits and increase them based on actual usage metrics from something like Prometheus.
Quota Your Dev and Prod Differently: This should be obvious, but your development namespace quotas can be much lower than production. This not only saves resources but also encourages developers to write efficient code and manifests early on. If their app needs 4GB of memory to run in dev, it’s probably poorly configured.
Use Object Quotas to Prevent Resource Exhaustion: Capping configmaps, secrets, and services isn’t about being mean; it’s about protecting the control plane. A misbehaving controller that creates thousands of secrets can impact the performance of the API server for everyone. A low, reasonable cap on these objects acts as a circuit breaker.