Right, let’s talk about events. If your cluster is a crime scene (and it often feels like one), kubectl get events is your first and best witness. It’s the system’s gossip column, a running log of the “who, what, and when” behind almost every state change. Ignore it, and you’re troubleshooting blind. Rely on it too much, and you’ll drown in a firehose of mostly-irrelevant data. Let’s learn how to drink from that firehose effectively.

The Anatomy of an Event

First, you need to speak the language. An event isn’t just a random error message; it’s a structured Kubernetes object. Let’s pull one apart. Run this to get a recent event in a nice, readable format:

kubectl get events --sort-by=.lastTimestamp -o wide

You’ll get a table with columns like LAST SEEN, TYPE, REASON, OBJECT, and MESSAGE. Here’s the crucial part: the MESSAGE is often the human-readable summary, but the REASON is the machine-readable key you’ll use to search documentation or Google. A TYPE of Warning is your cluster politely clearing its throat before something catches on fire. Normal events are just the system’s diary entries (“Pod scheduled,” “Pulled image”).

Why is it structured this way? Because controllers (like the Deployment or Node controller) are programmed to emit specific events for specific state changes. This allows other systems (like monitoring or alerting tools) to parse them consistently, not just humans.

Cutting Through the Noise

The default kubectl get events output is a chronological mess. You’ll see a event about a pod from 5 hours ago, then one from 2 minutes ago, then another from 3 days ago. It’s useless. You must sort them. My go-to command, which I have aliased to kge, is this:

kubectl get events --sort-by=.lastTimestamp

To watch events in real-time, which is incredibly useful when you’re deploying something and want to see it unfold (or unravel), add the -w flag:

kubectl get events --sort-by=.lastTimestamp -w

Now you’ll see events as they happen. To focus only on the bad stuff, filter for Warning types:

kubectl get events --field-selector type=Warning --sort-by=.lastTimestamp

This immediately cuts out 80% of the Normal noise and shows you only the problems. It’s your first and most important filter.

The Most Common “Oh Crap” Events

You’ll see these. A lot.

  • FailedScheduling: The scheduler couldn’t find a node to run your pod. The MESSAGE is your best friend here. It usually tells you exactly why, e.g., “0/3 nodes are available: 1 Insufficient cpu, 2 node(s) had taint {node.kubernetes.io/disk-pressure: }.” This isn’t a mystery; it’s a direct report. You either need to add more resources, fix the taints, or add tolerations.
  • FailedToPullImage: You typo’d the image name (my-app:v1 vs. my-app:v1), the tag doesn’t exist, or you’re trying to pull from a private registry without providing credentials. The message will specify which one.
  • BackOff or CrashLoopBackOff: The pod started but then died. Repeatedly. This is where you stop looking at events and start using kubectl logs on the pod to see why the application inside is crashing. The event just tells you it’s happening; the logs tell you why.
  • Unhealthy: Your liveness or readiness probe is failing. This means your container is running, but it’s not responding to health checks correctly. Again, kubectl logs and potentially kubectl exec to get inside the container are your next steps.

The Event Lifecycle and Limitations

Here’s the part the documentation often glosses over: events are ephemeral. By default, they are stored in the API server’s memory, not eternally in a database. A Normal event gets tossed after about 60 minutes, and a Warning might stick around for a measly 2 hours. If your node has been down for a day, you won’t find the original NodeNotReady event. It’s gone.

This is a design choice, not an oversight. Events are for real-time debugging, not historical audit logging. For that, you need a proper observability stack that sinks events to something like Elasticsearch or Loki. If you’re trying to figure out what happened yesterday without one, you’re probably out of luck. It’s a rough edge you just have to accept.

Putting It All Together: A Real-World Example

Let’s say you run kubectl get pods and see a pod is Pending. Don’t just sit there wondering. Interrogate the events.

kubectl get events --field-selector involvedObject.name=my-stuck-pod-12345

This filters events only for that specific pod. The output might show:

  1. A Normal Scheduled event, placing it on a node. Okay, good.
  2. A Warning FailedToPullImage with the message “repository my-privat-registry.com/my-app not found.” Ah. There’s the problem. I can’t spell “private.”

The events gave me the direct, causal link between the pod’s Pending state and my own incompetence. That’s the power of using them effectively. It’s not just a log; it’s the story of your cluster’s state, told one cryptic, time-stamped message at a time. Now go listen to it.