9.3 Tolerations to Schedule on Tainted Nodes
Right, so you’ve got your DaemonSet humming along, deploying its pod to every node in your cluster. It’s a beautiful thing. But then you run into the real world, and the real world has problems. Some of your nodes are, shall we say, special. Maybe they’re GPU-equipped beasts that cost more than your car, reserved for machine learning workloads. Maybe they’re edge nodes with spotty connections, or they’re just old and cranky and you don’t trust anything but a specific monitoring agent to run on them.
This is where the Kubernetes scheduler’s “taint” mechanism comes in. A taint is basically a node’s way of saying, “Get off my lawn!” It repels all pods that haven’t explicitly said they’re okay with the particular brand of nonsense that node is serving. By default, your pristine DaemonSet pod will be repelled. It sees the taint, says “hard pass,” and the scheduler leaves that node alone.
To get your pod onto that tainted node, you need to give it a matching “toleration.” You’re not removing the taint from the node (other pods should still be repelled!), you’re just giving your specific pod a hall pass.
A toleration isn’t an assignment; it’s a permission. It tells the scheduler, “I’m cool with hanging out on a node that has this particular taint.” The scheduler still does its normal thing based on other constraints, but this one roadblock is removed.
The Anatomy of a Toleration
Let’s look at the spec. It’s deceptively simple, which is where most of the mistakes happen. You slap this into your Pod template spec (or directly into your DaemonSet spec).
tolerations:
- key: "gpu-node"
operator: "Equal"
value: "true"
effect: "NoSchedule"
Let’s break down what these fields actually mean because if you get them wrong, nothing happens and you’ll waste an hour wondering why.
key: This is the name of the taint you’re tolerating. It’s just a string label.operator: This is the big one.Equalmeans you’re saying “the taint’s value must exactly match myvaluefield.” Alternatively, you can useExists, which is the equivalent of saying “I tolerate any taint with this key, I don’t care what its value is.” This is a common source of confusion. UsingExistsand specifying avalueis invalid and will get you nowhere.value: The value you expect the taint to have. As mentioned, only used withoperator: Equal.effect: This is the action the taint takes. The most common areNoSchedule(won’t schedule new pods),PreferNoSchedule(a gentle suggestion), andNoExecute(this one is brutal—it evicts already-running pods that don’t tolerate it after a toleration seconds grace period). Your toleration’seffectmust match the taint’s effect. Tolerating aNoScheduletaint does nothing for aNoExecutetaint. They are different animals.
Tolerating All the NoExecute Eviction
This is the one that’ll bite you if you’re not careful. You’ve happily toleranted the NoSchedule effect to get your pod onto the node. Then, later, an admin adds a NoExecute taint for some emergency maintenance. Your pod, which only tolerates NoSchedule, gets unceremoniously evicted. Whoops.
If you really need your DaemonSet pod to stick to a node through thick and thin (e.g., a node-level agent that must always run), you need to tolerate both effects.
tolerations:
- key: "dedicated"
operator: "Equal"
value: "special-tenant"
effect: "NoSchedule"
- key: "dedicated"
operator: "Equal"
value: "special-tenant"
effect: "NoExecute" # This is the eviction-force-field.
For a truly bulletproof DaemonSet that clings to its node like a remora, you’ll often see this pattern, which tolerates every NoExecute taint indefinitely. Use this with extreme caution, as it can prevent necessary node maintenance.
tolerations:
- operator: "Exists" # Tolerate every taint key...
effect: "NoExecute" # ...with the NoExecute effect.
The Master Node Taint and Your DaemonSet
This is the most classic, real-world example. Out of the box, most clusters apply a taint to the control plane nodes: node-role.kubernetes.io/control-plane:NoSchedule or its older alias node-role.kubernetes.io/master:NoSchedule. This is why your nifty web app pods don’t get scheduled on the masters—which is good, you don’t want that.
But what if your DaemonSet is, say, a cluster-wide networking plugin or monitoring agent? You probably do want that on the masters. This is why core system DaemonSets like kube-proxy and Flannel/Calico include a toleration for it.
Here’s how you’d add it to your own DaemonSet. It’s so common it’s basically a best practice for any cluster-level infrastructure DaemonSet.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: my-essential-daemonset
spec:
selector:
matchLabels:
app: my-essential-app
template:
metadata:
labels:
app: my-essential-app
spec:
tolerations:
# This tolerates the common control-plane taint
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
# ... and your other tolerations go here
containers:
- name: main
image: my-essential-app:latest
The key takeaway? Tolerations are what make DaemonSets truly powerful, transforming them from a “deploy to every vanilla node” tool into a “deploy to every node that meets this criteria, no matter how weird it is” tool. Always double-check your operator and effect—getting them right is the difference between a pod that runs where you need it and one that politely avoids the problem entirely.