42.3 Debugging with kubectl describe and kubectl logs
Right, so your pod is stuck in Pending or your application is coughing up an error. You’re not going to just stare at kubectl get pods and hope it magically starts working, are you? Of course not. You’re going to ask the cluster what on earth it’s thinking. Your two best friends for this are kubectl describe and kubectl logs. One tells you what the cluster thinks is happening to your pod, and the other tells you what’s actually happening inside it. Let’s break them down.
Your First Move: kubectl describe
Think of kubectl describe as the pod’s official medical chart. It doesn’t show the patient’s internal monologue (that’s for logs), but it shows the doctor’s notes: diagnoses, scheduled treatments, and, crucially, why a treatment hasn’t started yet. It amalgamates information from the Pod spec, its related Events, and its current status into one brilliant, verbose readout.
The most common and powerful use is on a Pod that’s misbehaving.
kubectl describe pod my-broken-app-59ff66d74-zhx8k
The output is dense, but you’re not reading it for fun; you’re hunting for specific sections. Scroll past the metadata and spec until you hit the gold: Events. This is a chronological list of the cluster’s attempts to make your wishes into reality. This is where you find the truth.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m10s (x5 over 3m) default-scheduler 0/3 nodes are available: 3 Insufficient cpu.
Boom. There’s your answer. It’s not a mystical incantation error; the scheduler is literally telling you, “I have nowhere to put this thing because you asked for more CPU than any node has free.” Maybe you got greedy in your resources.requests, or maybe another pod is hogging the node. Either way, describe just saved you an hour of guessing.
Another classic from the Events section:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 37s default-scheduler Successfully assigned default/my-app to node-01
Normal Pulling 21s (x3 over 36s) kubelet, node-01 Pulling image "my-private-registry.com/app:v1"
Warning Failed 20s (x3 over 35s) kubelet, node-01 Failed to pull image "my-private-registry.com/app:v1": pull access denied, repository does not exist or may require 'docker login'
The scheduler did its job, but the kubelet on the node can’t pull the image. It’s probably a missing image pull secret. The Events log is brutally honest, and you should thank it for that.
The Inside Story: kubectl logs
While describe tells you about the pod’s container, logs shows you what’s happening inside the container. This is your application’s stdout and stderr. If your pod is running but isn’t working, this is your very first stop.
The basic command is straightforward:
kubectl logs my-broken-app-59ff66d74-zhx8k
But here’s where the pitfalls start. The most common “oh, come on” moment is when you have a multi-container pod. By default, kubectl logs picks the first container, which is almost never the one you want. You must specify the container name.
kubectl logs pod/my-broken-app-59ff66d74-zhx8k -c my-app-container
Another critical flag is --previous. If your container crashed and restarted, the logs from the previous instance are gone by default. This is a fantastic way to miss the exact error that caused the crash. If your pod is in a crash loop, always use this flag to see what killed the previous incarnation.
kubectl logs pod/my-crashing-app-pod --previous
You’ll likely see a glorious Java NullPointerException or a Python ModuleNotFoundError that your local testing somehow missed. It happens to the best of us.
When You Need the Bigger Picture: describe on Other Resources
Don’t limit describe to just pods. It’s your diagnostic tool for the entire cluster. Is a pod stuck in Pending? Describe the pod to see the scheduling error. Still confused? Describe the node it’s supposed to land on.
kubectl describe node node-01
Look at the Allocatable vs. Allocated resources section. Is there actually CPU free? Is there a pesky NoExecute taint that’s evicting your pods without you realizing it? The node’s describe output will show you all of that.
Is your service not getting an IP? Describe it.
kubectl describe service my-broken-service
Look for Endpoints: <none>. This is Kubernetes’ polite way of saying “I have no idea what pods you’re talking about.” This almost always means your service’s selector labels don’t match any pods. It’s a label typo. It’s always a label typo. Go check them. I’ll wait.
Best Practices and the Obvious-but-Crucial
First, always use kubectl get pods --show-labels. This is the fastest way to verify that your pod has the labels your service expects. It’s the binoculars you use before you even fire the describe cannon.
Second, the output of describe and logs can be huge. Don’t just scroll aimlessly. Use grep or the -A / -B flags to isolate the problem. Looking for errors?
kubectl describe pod my-pod | grep -A 5 -B 5 -i "error\|fail"
Third, remember that logs are ephemeral by default. They live and die with the pod. For anything remotely important, you need a real logging solution that ingests logs from every pod and node. But for the love of all that is holy, use kubectl logs for your initial debugging. It’s the fastest feedback loop you have.
Combining these two commands—describe to understand the cluster’s plan and logs to see the application’s reality—is how you move from blindly guessing to systematically dismantling a problem. Now go fix that pod. It’s not going to debug itself.