5.5 Pod Lifecycle: Pending, Running, Succeeded, Failed, Unknown
Alright, let’s get down to brass tacks. You can’t talk about Pods without understanding their lifecycle. It’s the story of their existence, from a hopeful idea in the scheduler’s mind to their glorious—or more often, tragically brief—demise. Think of it like a play with five possible acts: Pending, Running, Succeeded, Failed, and the enigmatic Unknown. Your job is to be the stage manager, not just the audience.
The Five Fates of a Pod
Kubernetes doesn’t deal in vagueness. A Pod’s status is a first-class citizen, and it will be slotted into one of these five phases. You can see it yourself with a classic kubectl get pods:
kubectl get pods
NAME READY STATUS RESTARTS AGE
my-pod 1/1 Running 0 5m
That STATUS column is your main window into the Pod’s soul. Let’s break down what each one actually means, beyond the dictionary definition.
Pending: The Agony of Potential
This is where hope lives, and resources are scarce. A Pod is Pending when the API server has accepted its manifest (you yelled “action!”), but the underlying container images haven’t been pulled and scheduled onto a happy node yet. The most common reason? The scheduler can’t find a node with enough CPU or memory to fit your, ahem, generously sized container. Or, you’re waiting on a PersistentVolumeClaim to be fulfilled.
This is where you start your debugging. kubectl describe pod <pod-name> is your best friend here. Scroll down to the Events section. It’s the gossip column of Kubernetes and will tell you exactly why your Pod is stuck.
kubectl describe pod my-pending-pod
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m default-scheduler 0/3 nodes are available: 3 Insufficient cpu.
See? Not a mystery. The scheduler is literally telling you, “I have three nodes, and all of them think your Pod is too fat.” Brutal, but effective.
Running: The Main Event
A Pod is Running when it has been bound to a node and at least one container inside it is still… running. This seems obvious, but note the subtlety: a Pod can be Running even if a container is crashing and restarting every two seconds. The Running status refers to the Pod’s runtime environment being operational, not necessarily the health of your application. That’s what readiness and liveness probes are for (a conversation for another day). This is the state you want. Enjoy it while it lasts.
Succeeded: A Graceful Exit
This one’s for the overachievers. A Pod reaches Succeeded when all containers in the Pod have terminated successfully—meaning they exited with a status code of 0. You’ll see this almost exclusively with Jobs and CronJobs. These are the one-and-done Pods, the batch processors that did their job and politely left the building.
You don’t usually need to worry about these; they’re finished. Kubernetes will keep them around for a while so you can inspect their logs if needed, but eventually, they’ll be cleaned up.
Failed: It All Goes Sideways
A Pod is Failed when at least one container in the Pod has terminated unsuccessfully (non-zero exit code). This is the most common end state for a flawed Pod spec or a buggy application. Maybe your app encountered a fatal error, maybe it tried to use a port that was already in use, or maybe you just typo’d the command and it instantly died.
The real fun begins with the restartPolicy. If it’s set to OnFailure (which it should be for Jobs), the Pod stays Failed. If it’s set to Always (the default for most Pods), Kubernetes will relentlessly restart the container, dragging the entire Pod back to Running state, only for it to probably fail again. This is where you get the RESTARTS count ticking up like a scoreboard of despair.
apiVersion: v1
kind: Pod
metadata:
name: always-failing
spec:
containers:
- name: misery
image: busybox
command: ['sh', '-c', 'echo "This is a mistake"; exit 1'] # This will always fail
restartPolicy: Always # Kubernetes will try, and try, and try again...
Unknown: The Void
This is the spookiest state. Unknown typically means the API server has lost all communication with the node (the kubelet) that was running the Pod. The node might have crashed, been disconnected from the network, or been upgraded and cordoned off. The API server has no idea what’s happening on that node, so it throws its hands up and marks the Pod’s status as Unknown.
It’s not a verdict on the Pod’s health; it’s a verdict on the API server’s knowledge. When this happens, your first instinct shouldn’t be to look at the Pod, but to look at the node (kubectl get nodes). Is it NotReady? The problem is almost certainly there.
The lifecycle isn’t just academic. It’s the fundamental rhythm of your cluster. Your debugging workflow should mirror it: check the status, describe the Pod to read the events, check the node, and read the logs. Now you’re not just running commands; you’re listening to what the system is telling you.