7.4 Recreate Strategy: Full Restart

Alright, let’s talk about the Recreate strategy. You’re going to use this when you absolutely, positively cannot have two versions of your application running at the same time. Think of it as the “scorched earth” or “turn it off and on again” method of deployment. It’s brutally simple: we completely terminate all the old Pods first. Only after they’re all dead and gone do we spin up the new ones.

Why would you choose this? The classic example is an application that uses a persistent volume claim (PVC) for its data directory—like a database. Most databases get real cranky if you try to run two instances pointing at the same block storage. They’ll start a fight over who owns the data, and nobody wins that fight. The Recreate strategy avoids this by ensuring there’s only one instance alive at any given time. It’s also useful for legacy apps that might not play nice with gradual, service-meshed rollouts.

How It Actually Works (The Gory Details)

When you update your Pod template—say, you change the Docker image tag—the ReplicaSet controller doesn’t mess around. Here’s the play-by-play:

It scales the current ReplicaSet (the old version) down to 0. It sends a SIGTERM to every single Pod. They begin their graceful termination period.
It waits. It waits until every last one of those old Pods is completely terminated. No lingering, no “almost dead.” This is a full shutdown.
Only after the coast is completely clear does it scale the new ReplicaSet up to the desired number of replicas. New Pods are created, they go through their init containers, and they start up.

The key thing to understand here is the downtime. Your service will be completely unavailable between the moment the last old Pod dies and the moment the first new Pod is ready to serve traffic. There is no rolling update. There is no canary. There is only void.

Here’s a simple Deployment manifest using the Recreate strategy. Save this as deployment-recreate.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-dramatic-app
spec:
  replicas: 3
  strategy:
    type: Recreate  # This is the crucial line
  selector:
    matchLabels:
      app: my-dramatic-app
  template:
    metadata:
      labels:
        app: my-dramatic-app
    spec:
      containers:
      - name: app
        image: my-registry.io/app:v1.0.0 # We'll update this later
        ports:
        - containerPort: 8080

Apply it with kubectl apply -f deployment-recreate.yaml. Now, let’s trigger an update. You can do this by using kubectl set image:

kubectl set image deployment/my-dramatic-app app=my-registry.io/app:v2.0.0

Now, watch the fireworks. Use kubectl get pods -w and you’ll see the entire old set of Pods enter Terminating state. They’ll all vanish. There will be a palpable, depressing silence for a few seconds. Then, and only then, the new Pods will appear and start their ContainerCreating process.

The Inevitable Downtime (And How to Mitigate It)

Let’s not sugarcoat it: this strategy causes downtime. It’s a design choice, not a bug. You’re trading continuous availability for simplicity and data safety. Because of this, you should almost never use it for user-facing, web-scale applications unless you have a very, very good reason (see: databases).

If you must use Recreate but want to minimize the pain, here’s what you do:

Optimize Your Startup Time: This is the biggest lever you can pull. The downtime is directly equal to your new application’s startup time. If your app takes 45 seconds to wake up, connect to its database, and load its cache, that’s 45 seconds of outage. So make your startup lightning fast. Pre-warm caches, use lazy loading where possible, and make those health checks snappy.
Schedule It: Use this during a maintenance window. Don’t just blast it out on a Tuesday afternoon. Tell people you’re doing it.
Consider Readiness Probes: A good readiness probe is critical here. The new Pods won’t receive traffic until their probe passes. If your probe accurately represents being “ready,” you ensure users don’t get hit with errors as the new containers are still initializing.

The Pod Termination Grace Period Wildcard

Remember when I said the controller sends a SIGTERM? Each Pod has a terminationGracePeriodSeconds (default 30 seconds) to wrap up its affairs and shut down gracefully. Your application needs to handle SIGTERM correctly, closing connections and finishing up work.

If your old Pods get stuck—maybe they ignore SIGTERM or have a bug that prevents shutdown—they’ll sit there in Terminating state for the entire grace period before being forcibly killed with SIGKILL. And the rollout? It’s stuck waiting. The new Pods won’t be created until the old ones are completely gone. This is a common pitfall. Always, always test your application’s graceful shutdown behavior.