41.5 Node Drain and Cordon During Upgrades

Alright, let’s get our hands dirty with the real first step of any upgrade: politely telling a node to stop accepting new work so we can kick it out for maintenance. This isn’t a suggestion; it’s a controlled, graceful shutdown of its workload. We do this with two simple but powerful concepts: cordon and drain. Think of it as the Kubernetes equivalent of putting an “Out for Lunch” sign on a door.

kubectl cordon is your “Do Not Disturb” sign. It marks a node as unschedulable. New pods won’t be placed on it, but the existing ones will keep humming along, blissfully unaware. This is your first move. You do this before you start evicting pods with a drain. Why? Because if you don’t, the scheduler, in its infinite and sometimes overly enthusiastic wisdom, might see a drained node as a lovely, empty piece of real estate and immediately start placing new pods there the second you finish evicting the old ones. It’s like trying to clean your apartment while your roommate is actively throwing a party. Cordoning first stops the new partygoers at the door.

Once the node is cordoned, you move to kubectl drain. This is the actual eviction notice. It does a few key things:

It evicts all the pods (or, more accurately, it asks them to terminate gracefully) from the node.
It respects Pod Disruption Budgets (PDBs), which we’ll get to in a second—this is crucial.
It automatically cordons the node for you. It’s a two-for-one deal, but I still recommend doing it as two separate commands. It makes your intent clearer in logs and scripts.

Here’s the basic, no-frills command:

# First, make it unschedulable
kubectl cordon my-node-01

# Then, gracefully evict its workloads
kubectl drain my-node-01 --ignore-daemonsets --delete-emptydir-data

Why You Can’t Just `drain` and Hope for the Best

See those flags? You will learn to hate them if you don’t use them. Let’s talk about why they’re non-optional.

--ignore-daemonsets is a necessary evil. DaemonSets (like your CNI plugin, node monitoring agents, etc.) are designed to run on every node. By definition, they can’t be evicted. If you try to drain a node without this flag, the command will see these system pods, throw a fit, and abort the entire drain operation. It’s the cluster’s way of saying, “Hey, you can’t remove this, it’s important!” So you have to tell it, “Yes, I know, I’m not an idiot, ignore those please.” The pods managed by the DaemonSet will still be terminated and rescheduled on the next node by the DaemonSet controller itself, which is what you want.

--delete-emptydir-data is for your own safety. An emptyDir volume is essentially a temporary scratch disk on the node itself. When the pod is evicted, that data is gone forever. The drain command knows this, and it will refuse to evict a pod that uses an emptyDir volume unless you explicitly acknowledge you’re okay with this data loss. This flag is that acknowledgement. If you have anything important in an emptyDir, you’ve designed your application wrong, but the drain command is playing it safe and forcing you to confront that fact.

The PodDisruptionBudget: Your Safety Net

This is the most important concept in this whole process. A PodDisruptionBudget (PDB) is an API object that tells Kubernetes, “Hey, for this application, you must always keep at least this many pods running, or at most this many can be down at once.” It’s a contract between you and the cluster’s disruption lifecycle.

When you issue a drain command, the API server doesn’t just blindly evict everything. It checks against every relevant PDB. If evicting a pod would violate the PDB (e.g., it would take your “highly-available” web app from 3 replicas down to 2, and your PDB says minAvailable: 2), the drain command will block. It will wait until evicting the pod won’t break the contract, perhaps by waiting for another pod of the same application to become healthy on a different node.

This is why draining a node can sometimes seem to hang. It’s not hung; it’s being responsible. You can see what it’s waiting on:

kubectl describe pod my-pod-name
# Look for events related to eviction. Often it will say something like:
# "Cannot evict pod because it would violate the pod's disruption budget."

Always define PDBs for your stateful, critical workloads. It’s what separates a controlled rollout from a self-inflicted outage.

Handling the Stubborn Pods

Sometimes, a pod just won’t terminate. This is where the rubber meets the road. The drain command has a --force flag, but using it is like using a sledgehammer to solve a diplomacy problem. It bypasses Pod Disruption Budgets and ordinary pod deletion grace periods. You might cause an outage. Don’t use it unless you absolutely know what you’re doing and why you’re doing it.

Usually, a stuck drain means one of two things:

A Pod isn’t terminating. This is usually because the application inside the pod isn’t honoring the SIGTERM signal. It might be stuck in a loop or ignoring the signal entirely. After the grace period (usually 30 seconds), Kubernetes will send a SIGKILL and be done with it. The drain will eventually continue, but it will be slow.
A Pod isn’t restarting elsewhere. The eviction API works, the pod is gone, but the new one gets stuck in ContainerCreating or ImagePullBackOff. This isn’t a drain problem; it’s a cluster or application problem you’ve now uncovered. The drain did its job, but the rescheduling failed. Your upgrade is now blocked until you fix your networking, image registry, or resource quotas.

The drain process holds up a mirror to your cluster’s health. If it goes smoothly, you’ve probably built something robust. If it’s a struggle, it’s showing you exactly where your weak points are. Pay attention.

Why You Can’t Just drain and Hope for the Best

The PodDisruptionBudget: Your Safety Net

Handling the Stubborn Pods

Why You Can’t Just `drain` and Hope for the Best