34.5 Tolerations: Allowing Pods onto Tainted Nodes

Right, so you’ve tainted your nodes. Good for you. You’ve drawn a neat little “Keep Out” sign on a subset of your cluster, probably for a good reason. Maybe those nodes have expensive GPUs you don’t want wasted on a nginx pod, or they’re in a dodgy availability zone you’re trying to drain. But here’s the catch: what if you do want a pod to break the rules? What if you have a special, privileged workload that needs to run on that tainted hardware?

That’s where tolerations come in. Think of a toleration as your pod’s official “I’m with the band” backstage pass. It doesn’t force the pod onto a tainted node, but it allows the scheduler to consider placing it there, effectively ignoring the taint you so carefully applied. It’s the exception to your own rule.

How a Toleration Matches a Taint

This isn’t a vague handshake agreement; it’s a precise key-and-lock mechanism. For your pod to tolerate a taint, its toleration must match the taint on three specific fields:

key: The name of the taint. This is usually required.
value: The value of the taint. Sometimes you don’t care (operator: Exists), sometimes you need an exact match (operator: Equal).
effect: This is the big one. The toleration must specify the same effect (NoSchedule, PreferNoSchedule, NoExecute) that the taint has. You can’t tolerate a NoExecute taint with a toleration for NoSchedule. The scheduler isn’t that easily fooled.

The most common operator is Equal, which means “the taint’s value must match this exactly.” But you can also use Exists, which basically says, “I don’t care what the value is, as long as a taint with this key and effect exists.” This is how you tolerate entire classes of taints, like all NoExecute taints from a node controller.

Here’s a concrete example. Let’s say you tainted a node like this, perhaps because it has a special FPGA card:

kubectl taint nodes node-fpga-1 hardware=fpga:NoSchedule

A pod that needs to use that FPGA would need a toleration that is the mirror image of that taint:

apiVersion: v1
kind: Pod
metadata:
  name: fancy-fpga-pod
spec:
  containers:
  - name: main
    image: my-fpga-app:latest
  tolerations:
  - key: "hardware"
    operator: "Equal"
    value: "fpga"
    effect: "NoSchedule"

Without that tolerations stanza, the scheduler would take one look at the node-fpga-1 and nope right out of there, placing your pod on any other untainted node. With it, the scheduler says, “Ah, this pod has the pass. It’s allowed to be here.”

The operator Field: Equal vs. Exists

Let’s dig into that operator field because it’s a common point of confusion. You’ll almost always use one of two values:

operator: Equal: This is the precise matcher. You are specifying both a key and a value that must be exactly equal to the taint’s. This is for when you have a specific, meaningful value. The example above uses this.
operator: Exists: This is the broader, more powerful, and therefore more dangerous option. When you use Exists, you must not provide a value field. Its meaning is: “If any taint exists with this key and this effect, I tolerate it, regardless of its value.” This is how you write a toleration for a taint that might have variable values.

A classic use case for Exists is tolerating the built-in node.kubernetes.io/not-ready or node.kubernetes.io/unreachable taints, which the node controller adds automatically but don’t have a predictable value. You’re saying, “I’ll tolerate any taint with this key and effect.”

tolerations:
- key: "node.kubernetes.io/unreachable"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 300 # More on this in a sec

tolerationSeconds: The Get-Out-of-Jail-Free Card (That Expires)

This brings us to my favorite part of tolerations: tolerationSeconds. This field only applies to tolerations of the NoExecute effect, and it’s brilliantly useful.

Remember, a NoExecute taint doesn’t just prevent scheduling; it evicts running pods that don’t tolerate it. But what if your pod can handle a little bit of node instability? Evicting a pod the millisecond a node has a hiccup can cause more disruption than it prevents.

tolerationSeconds is your answer. It tells the Kubernetes eviction logic: “I know this node is tainted with NoExecute, but my pod is tough. Let it stay running for this many seconds before you kick it off. If the problem clears up within that time, we can all pretend this never happened.”

This is absolutely critical for stateful applications like databases that hate being moved. A sudden network blip might make the node controller mark a node as unreachable. Without this toleration, your PostgreSQL pod gets evicted immediately, causing a failover and potential data layer chaos. With it, you buy it 300 seconds (5 minutes) to see if the network comes back.

tolerations:
- key: "node.kubernetes.io/unreachable"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 300  # Wait 5 minutes before evicting
- key: "node.kubernetes.io/not-ready"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 300  # Same for a not-ready node

This is a best practice you should seriously consider for almost every production pod. It makes your cluster far more resilient to transient failures.

Common Pitfalls and the Power (and Danger) of Exists

The biggest mistake I see is being too broad with the Exists operator. Look at this toleration and tell me what’s wrong with it:

tolerations:
- operator: "Exists"  # Danger, Will Robinson!

This toleration, with just operator: Exists and no other fields, is the equivalent of a universal backstage pass. It means “This pod tolerates every single taint, regardless of its key, value, or effect.” Your pod will happily schedule and run on any node, no matter how you’ve tried to cordon it off. You have completely nullified the entire concept of taints. Don’t do this unless you have a phenomenally good reason (and you almost certainly don’t).

Another pitfall is forgetting that tolerations are about permission, not requirement. A toleration lets a pod onto a tainted node, but the scheduler might still put it somewhere else based on other constraints like resource requests or node affinity. If you must have your pod on a specific tainted node, you need to combine the toleration with a nodeSelector or nodeAffinity rule. The toleration opens the door; affinity points the pod to the specific chair inside.