7.3 Rolling Update Strategy: maxSurge and maxUnavailable

Right, so you’ve got your app deployed, it’s running smoothly, and now you need to push a new version. You’re not going to just kubectl delete --all and pray, are you? Of course not. You’re going to do a rolling update, the civilized way to upgrade your software without causing a full-blown production incident (or at least, a smaller, more manageable one).

The magic—and the control—of a rolling update is managed by two knobs in your Deployment spec: maxSurge and maxUnavailable. These two parameters are the yin and yang of your update strategy, controlling the trade-off between speed and availability. They’re the reason your users don’t get a 502 Bad Gateway error while you’re deploying.

Think of it like this: you’re a pilot swapping the engines on a plane mid-flight. maxUnavailable is how many engines can be offline at once. maxSurge is whether you’re allowed to temporarily bolt on an extra engine or two to help with the transition. Get this balance wrong, and let’s just say it’s a long swim to the destination.

What are maxSurge and maxUnavailable, really?

These settings live under spec.strategy.rollingUpdate and they can be defined as either an absolute integer (e.g., 2) or a percentage of your desired replicas (e.g., 25%). Percentages are usually the way to go unless you’re running a very small, fixed-number cluster.

maxUnavailable: This is the maximum number of Pods that can be unavailable during the update process. It’s your safety net for availability. If you set this to 25% on a Deployment with 4 replicas, Kubernetes will ensure at least 3 Pods (75%) are running and serving traffic at all times. It kills old Pods, then waits for their replacements to become Ready before moving on.
maxSurge: This is the maximum number of Pods that can be created above the desired number of replicas. This is your throttle for update speed. A 25% surge on that same 4-replica Deployment allows it to create 1 extra Pod (5 total), so it can bring a new one online before killing an old one. This is how you maintain capacity.

The math is simple but crucial: ready replicas + surging replicas >= desired replicas - maxUnavailable.

Here’s what a thoughtful configuration looks like in the wild:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: witty-api
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%    # So, 1 extra Pod can be created (4 * 0.25 = 1)
      maxUnavailable: 25% # So, at most 1 Pod can be unavailable at a time
  selector:
    matchLabels:
      app: witty-api
  template:
    spec:
      containers:
      - name: api
        image: my-app:1.2.0 # Bumping from 1.1.0

With this setup, the rollout happens one Pod at a time. Kubernetes will:

Create Pod #5 (the surge), wait for it to become Ready.
Kill an old Pod (now we’re at 4 total: 3 old, 1 new).
Wait for the cluster to stabilize.
Create Pod #6 (the new surge, replacing the one just killed), wait for it.
Kill another old Pod… and so on until all 4 Pods are the new version.

Why You Should Care About These Values

The default if you don’t set them is maxSurge: 25% and maxUnavailable: 25%. This is a fine starting point, but it’s not optimal for every scenario. You need to think about what your application can tolerate.

The “I Can’t Lose a Single Request” Scenario: Your service is mission-critical. Set maxUnavailable: 0 and maxSurge: 100%. This is the “overprovision first” strategy. Kubernetes will first create every new Pod (all 4 new ones in our example, bringing the total to 8), wait for them all to be Ready, and then start killing the old ones. Your capacity doubles during the update, but you have zero downtime. You pay for it in cluster resources, but sometimes that’s the price of doing business.
The “I Want This Over With ASAP” Scenario: You need to deploy a hotfix now. Set a higher maxUnavailable, like 50%. Kubernetes will take down half your Pods at once, replace them, then do the other half. It’s fast, but your capacity is halved during the process. If a sudden traffic spike hits, you’re in for a bad time. Use this only if you’re confident in your capacity buffer.
The “My App Takes Forever to Start” Scenario: If your new Pods take 3 minutes to warm up (caches, JVM, etc.), a slow, serial rollout (maxUnavailable: 1, maxSurge: 0) means your update will take forever. In this case, you want to use maxSurge to allow multiple new Pods to be started in parallel, so the slow initialization happens concurrently, not sequentially.

The Pitfalls and The “Oh Crap” Moments

Readiness Probes Are Not Optional. This is the most important point. The entire rolling update logic hinges on the readinessProbe. Kubernetes decides if a Pod is “available” based on this probe. If you mess it up and your probe returns success before the app is actually ready to serve traffic, Kubernetes will kill the old Pods and send traffic to the new, broken ones. Congratulations, you just rolled out a failure. If your probe is too strict and never succeeds, your update will hang forever. Get this right.
Resource Quotas Will Bite You. If you use a surge strategy and have tight resource quotas in your namespace, the attempt to create the extra Pods will fail, and your rollout will be stuck. Plan your quotas accordingly.
Rollbacks Are Your Friend. Pushed a bad image? kubectl rollout undo deployment/witty-api. It’s a lifesaver. It performs a rolling update right back to the previous revision using the exact same maxSurge and maxUnavailable rules.

The bottom line? maxSurge and maxUnavailable are levers. Pull them based on your needs for speed, availability, and resource consumption. There’s no single right answer, only the right answer for your app, right now. Test your rollout strategies in a staging environment that mirrors prod. Watch the kubectl get pods output during a deploy. See the dance happen in real time. It’s the only way to truly understand it and avoid being that person who explains to the CEO why the website was down for “just a few minutes.”