Alright, let’s talk about what happens when your Pod’s main process decides to take a nap, gets stage fright, or just flat-out dies. This is where the restartPolicy comes in. It’s the instruction manual you leave for the kubelet (the node agent) on what to do when the container(s) inside your Pod exit.

Think of it as your contingency plan. Do you want it to always try again, like an optimistic friend who believes the third time’s the charm? Or only if it fails spectacularly? Or maybe you want it to just lie there and not move, like a teenager on a Saturday morning? Kubernetes gives you three choices for this: Always, OnFailure, and Never.

Here’s the crucial bit that everyone trips over at least once: this policy is for the Pod, not for the process inside the container. The kubelet is solely responsible for restarting the container, and it only does so on the same node. If the entire Pod gets evicted from a node (e.g., the node dies, the cluster scales down, you manually tell it to go away), a different system, the controller, is responsible for creating a replacement Pod on a new node. This distinction is everything. The kubelet handles local container restarts; controllers handle Pod resurrection. Keep that in mind.

The Three Policies, Demystified

Let’s break down what each policy actually means. You set this in the Pod’s .spec section.

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  restartPolicy: OnFailure # This is the line we're talking about
  containers:
  - name: my-container
    image: busybox
    command: ['sh', '-c', 'echo "Hello, World!"']
  • Always: This is the default for Pods managed by a controller (like a Deployment). It’s the kubelet’s relentless “try, try again” mode. If the main process in the container exits for any reason (success, failure, you name it), the kubelet will restart it. It’s the right choice for long-running processes you always want up, like a web server or application API. The joke here is that it will even restart a container that exited with a perfect, clean exit code of 0. “I see you succeeded brilliantly. You must be tired from all that succeeding. Let me restart you so you can succeed again.”

  • OnFailure: This is the pragmatic choice. The kubelet will only restart the container if it exits with a non-zero status code—in other words, if it actually fails. A successful exit (code 0) is considered a job well done, and the container is left in a terminated state. This is what you use for batch jobs, one-off tasks, or anything you want to run to completion, successfully, maybe even just once. It’s the kubelet saying, “I’ve got your back, but only if you screw up.”

  • Never: The hands-off approach. No matter what happens—abject failure (non-zero exit) or glorious success (zero exit)—the kubelet does nothing. The container is left in its exited state. This is useful for debugging (“I need to see the logs of that exact failure”) or for running very specific, manually triggered tasks where you want to inspect the aftermath, pod carcass and all.

The Critical Nuance: Pod vs. Container Phases

This is where it gets slightly mind-bendy. The restartPolicy directly influences the Pod’s status.phase. A Pod’s phase is a high-level summary of where it’s at in its lifecycle: Pending, Running, Succeeded, Failed, or Unknown.

For a Pod with restartPolicy: Always, it will almost never be in Succeeded. Why? Because even a clean exit gets restarted, so it flip-flops between Running and briefly Pending during the restart. It only moves to Failed if it keeps exiting repeatedly and hits a restart backoff threshold.

For a Pod with restartPolicy: OnFailure or Never, the phase is a true reflection of the container’s outcome. If the main container exits with code 0, the Pod’s phase becomes Succeeded. If it exits with a non-zero code, the phase becomes Failed. This is why you can kubectl get pods and see a job you ran yesterday still sitting there with STATUS: Completed.

Best Practices and Pitfalls

  • Deployments want Always. This is non-negotiable. If you try to create a Deployment with a Pod template that has restartPolicy: OnFailure, the API will happily accept it… and then the Deployment controller will silently ignore it and run your Pods with Always anyway. It’s a classic Kubernetes “I’ll allow it, but I will also completely disregard your wishes” moment. Save yourself the confusion; only use OnFailure or Never for bare Pods or Pods managed by a Job controller.

  • Jobs want OnFailure or Never. The Job controller is designed to manage Pods that run to completion. You’d typically use OnFailure so the Job will retry if something goes wrong. Setting a Job’s Pod template to Always is a recipe for confusion, as the Pod will keep getting restarted on failure, but the Job controller might also be trying to create new Pods. It’s messy. Don’t do it.

  • The Restart Backoff. The kubelet isn’t a complete masochist. If a container with Always or OnFailure keeps crashing immediately, the kubelet will employ an exponential back-off delay (10s, 20s, 40s, etc.) before restarting it. This prevents it from hammering your node and filling logs with useless restart loops. You’ll see this as CrashLoopBackOff in kubectl get pods. It’s not an error itself; it’s a protective state. The error is whatever is making your container crash in the first place.

  • Debugging with Never. When you can’t figure out why something is failing, change your Pod spec (or your Job’s Pod template) to restartPolicy: Never. This allows the container to exit and stay exited, preserving its logs and final state so you can kubectl logs <pod-name> and kubectl describe pod <pod-name> to get the actual failure message and exit code. It’s your best friend for troubleshooting.