8.6 StatefulSet Update Strategies: RollingUpdate and OnDelete

Right, so you’ve got your StatefulSet humming along, managing your pods with their precious stable identities and persistent storage. It’s a beautiful, orderly parade. But nothing lasts forever, my friend. Eventually, you’ll need to update the container image, maybe for a new feature or a critical security patch. This is where the designers of StatefulSets, in their infinite wisdom, gave us two primary strategies: RollingUpdate and OnDelete. And let me tell you, the choice between them is less about which is “better” and more about which flavor of control you want over the inevitable chaos.

The updateStrategy is a field in your StatefulSet spec, and it looks deceptively simple.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  updateStrategy:
    type: RollingUpdate # or OnDelete
    rollingUpdate:
      partition: 0 # This is the magic knob. More on this in a sec.
  selector:
    matchLabels:
      app: nginx
  serviceName: "nginx"
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.25.1 # Let's say we want to update this to 1.25.2
        ports:
        - containerPort: 80

The Default: RollingUpdate (with a Twist)

If you set type: RollingUpdate, the StatefulSet controller will automatically begin updating your Pods one by one. But here’s the crucial, StatefulSet-y part: it does so in reverse ordinal order. Pod web-2 gets updated first, then web-1, and finally web-0. This order is sacrosanct. Why? Because the designers assume your highest-ordinal pod (the “newest” one) is the least critical. Taking it down first is less likely to impact a quorum or primary node in a stateful system, which often reside on the lower-ordinal pods (like web-0).

It’s a reasonable assumption, but it’s not a law of nature. This is where the partition field comes in, which is the single most important tool for mastering StatefulSet updates.

The Secret Weapon: The `partition` Field

The partition is an integer field (defaults to 0) that acts as a “update barrier.” Pods with an ordinal that is greater than or equal to the partition value will be updated. Pods with an ordinal less than the partition value will be left the hell alone.

This is how you do a canary rollout. Let’s say you have 3 replicas and you want to only update the highest-ordinal pod (web-2) to test the new image.

updateStrategy:
  type: RollingUpdate
  rollingUpdate:
    partition: 2 # Only update pods with ordinals >= 2 (i.e., just pod 2)

You apply this manifest. The controller sees the change and… only updates web-2. web-1 and web-0 stay running the old image. You can monitor web-2, run your tests, make sure it’s not on fire. Once you’re happy, you gradually lower the partition to roll out the update to everyone else.

partition: 1 # Now update pods with ordinals >= 1 (pods 1 and 2)

…and finally:

partition: 0 # Update all pods (>= 0)

This is brilliant because it gives you precise, imperative control over the rollout process using a declarative API. You can even reverse a rollout if you hit a problem. Just set the partition back to a higher number. The controller will gracefully revert the updated pods back to the previous, known-good image version. Try doing that with a Deployment.

The “I’ll Do It Myself” Option: OnDelete

Then there’s OnDelete. This is the stubborn, manual option. You set type: OnDelete, apply the change to your StatefulSet, and… nothing happens. Absolutely nothing. The controller will patiently wait for you to manually delete a Pod. Only when you kubectl delete pod web-1 will the controller, upon seeing the deletion, create a new replacement Pod with the updated template.

Why would you ever choose this? Control. Absolute, total, micromanaging control. It’s for when you have a workload so fragile, so stateful, so dependent on complex initialization procedures that you need a human (or a very sophisticated external script) to explicitly decide when each pod is ready to be killed and replaced. It’s the equivalent of handing you a scalpel instead of using a shotgun. It’s more work, but sometimes that’s what the situation demands.

Common Pitfalls and The Rough Edge

The biggest “gotcha” is one of sequencing and readiness. The StatefulSet controller is obedient to a fault. It will not proceed to update the next pod (web-1) until the newly updated pod (web-2) is not only Running but also ready (i.e., its readiness probe passes). If your new image has a bug that causes the readiness probe to fail, your rollout will hang forever, stuck on that pod. This is a feature, not a bug—it prevents a bad update from taking down your entire system. But it means your readiness probes need to be rock solid; they are the gatekeepers of your rollout.

So, choose your weapon. RollingUpdate with a smart partition strategy is your go-to for most controlled, automated rollouts. OnDelete is for those special occasions where you need to personally hold the hand of each pod as it goes through its traumatic rebirth. Either way, you’re getting that stable, ordered predictability that makes StatefulSets so invaluable.

The Default: RollingUpdate (with a Twist)

The Secret Weapon: The partition Field

The “I’ll Do It Myself” Option: OnDelete

Common Pitfalls and The Rough Edge

The Secret Weapon: The `partition` Field