9.5 Running DaemonSets Only on a Subset of Nodes

Right, so you’ve got a DaemonSet. It’s happily deploying its pod on every single node in your cluster. That’s its job. But what if you don’t want it on every node? What if your brilliant log-collector pod needs a specific filesystem mount that only exists on your workhorse compute nodes, or your gpu-model-inferencer has absolutely no business running on the cheap little spot instances handling your web traffic?

This is where we stop the DaemonSet’s tyrannical reign of “one for all” and introduce some democracy. We use the bouncers of the Kubernetes club: nodeSelectors, Taints and Tolerations, and if we’re feeling fancy, nodeAffinity. Let’s break it down.

The Simple, Blunt Instrument: nodeSelector

The most straightforward way to limit your DaemonSet is nodeSelector. It’s a simple key-value matcher. You label your nodes, then tell the DaemonSet to only run on nodes with that label.

First, find a node you want to target. Get its name with kubectl get nodes. See its labels with kubectl describe node <node-name>. You’ll notice a bunch of standard labels like kubernetes.io/os=linux.

Now, let’s say we have a pool of nodes with fast NVMe drives. We’ll label one:

kubectl label nodes <node-name> hardware-type=nvme

Now, we write a DaemonSet that only schedules on nodes with that label. The crucial part is the nodeSelector field in the Pod spec.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvme-log-collector
spec:
  selector:
    matchLabels:
      name: nvme-log-collector
  template:
    metadata:
      labels:
        name: nvme-log-collector
    spec:
      containers:
      - name: collector
        image: my-nvme-collector:latest
      # This is the magic bit:
      nodeSelector:
        hardware-type: nvme

It’s simple, effective, and a bit dumb. It only checks for the existence of that key-value pair. It can’t handle more complex logic like “nvme or ssd”. For that, we need a sharper tool.

The Sophisticated Choice: nodeAffinity

nodeSelector is like a yes/no question. nodeAffinity is the full-blown interview process. It gives you expressive, complex rules for where your pods can and cannot run. This is how you do “subset of nodes” for real.

Let’s upgrade our previous example. We want our DaemonSet on nodes labeled with hardware-type=nvme OR hardware-type=ssd. nodeSelector can’t do this. nodeAffinity can.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fast-disk-collector
spec:
  selector:
    matchLabels:
      name: fast-disk-collector
  template:
    metadata:
      labels:
        name: fast-disk-collector
    spec:
      containers:
      - name: collector
        image: my-fast-disk-collector:latest
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: hardware-type
                operator: In
                values:
                - nvme
                - ssd

See the operator: In? That’s the key. You can use other operators like Exists, Gt (for, I don’t know, version-number), or NotIn for an anti-affinity rule. The long-winded requiredDuringSchedulingIgnoredDuringExecution means “this rule is mandatory for getting the pod onto a node (scheduling), but if the label changes on the node later, we don’t evict the pod (ignored during execution)”. It’s almost always what you want.

The Bouncer and the VIP Pass: Taints and Tolerations

Here’s where the designers were both clever and, in my opinion, a bit obtuse with the naming. This mechanism works from the node’s perspective. You taint the node (put up a “no entry” sign). Then, you give your pod a toleration (a VIP pass that ignores that specific sign).

This is the primary way you cordon off special nodes. Master nodes are the classic example—they come pre-tainted with node-role.kubernetes.io/control-plane:NoSchedule. Your user-land DaemonSets won’t run on them because they don’t have the toleration for that taint.

Say you have a node reserved for running memory-hogging monsters. You taint it:

kubectl taint nodes <node-name> reserved=big-mem:NoSchedule

Now, only pods with a matching toleration can be scheduled there. Your dedicated DaemonSet for monitoring these beasts would need this in its Pod spec:

tolerations:
- key: "reserved"
  operator: "Equal"
  value: "big-mem"
  effect: "NoSchedule"

The effect is crucial. NoSchedule means “don’t put new pods here,” while NoExecute also evicts existing pods that don’t tolerate the taint. Use the latter with extreme care.

The Pitfall: A toleration is permission to run on a tainted node, not a request to run there. If you only define a toleration, your DaemonSet pod might still run on all your untainted nodes. To truly restrict it to only the tainted nodes, you must combine the toleration with a nodeSelector or nodeAffinity rule that matches the labeled, tainted nodes. It’s a two-step process: affinity gets it in the door, toleration lets it stay despite the smell. Get this wrong, and you’ll have a bad time. It’s the most common “why isn’t my DaemonSet working?!” head-scratcher I see.