42.9 Kubernetes Events: Using kubectl get events Effectively

Right, let’s talk about events. If your cluster is a crime scene (and it often feels like one), kubectl get events is your first and best witness. It’s the system’s gossip column, a running log of the “who, what, and when” behind almost every state change. Ignore it, and you’re troubleshooting blind. Rely on it too much, and you’ll drown in a firehose of mostly-irrelevant data. Let’s learn how to drink from that firehose effectively.

42.8 Networking Debugging: DNS, Service, and Network Policy Issues

Alright, let’s get our hands dirty. Networking in Kubernetes is where the rubber meets the road, and where things often go spectacularly, head-scratchingly wrong. It’s a complex beast, but we can tame it by breaking it down into its core components: DNS, Services, and Network Policies. Forget the marketing fluff; we’re going to talk about what actually happens on the wire. The First Command: nslookup is Your Best Friend When a pod can’t talk to another pod via its service name, your very first move shouldn’t be to panic. It should be to drop into a shell on a pod and run nslookup. This humble tool will tell you if CoreDNS (or whatever DNS server you’re running) is even responding and if it can resolve the service name to a ClusterIP.

42.7 Control Plane Failures: API Server, etcd, Scheduler

Right, so your cluster has gone sideways. The apps are down, kubectl commands are timing out, and that little voice in your head is whispering, “Was it something I did?” Probably. But more likely, it’s the control plane throwing a tantrum. This isn’t your application code; this is the brain of your entire operation having a stroke. We need to triage the patient. The control plane’s job is to maintain state. Its entire existence is a constant loop of “observe reality, compare to desired state, reconcile.” When it fails, that loop breaks. Your first clue is almost always the kubectl command hanging or spitting out a beautiful, utterly useless The connection to the server <server-name:port> was refused - did you specify the right host or port?. Don’t panic. This just means the API server, the front door to everything, is closed for business.

42.6 Node NotReady: Common Causes and Remediation

Alright, let’s talk about a Node going into NotReady state. It’s Kubernetes’ way of telling you, “Hey, I’ve got a problem over here and I can’t schedule any more work on this server.” It’s not being lazy; it’s being honest. Your job is to figure out why. Think of the Kubelet on each node as a harried middle manager. Its sole job is to constantly report back to the Control Plane (Head Office) that its node (retail store) is open for business and has shelf space. The Node object is that status report. When the Kubelet stops sending good reports—or any reports at all—the Control Plane, after a few minutes of radio silence, marks the node as NotReady. It’s a safety mechanism. It’d rather stop sending you customers than send them to a store that might be on fire.

42.5 Debugging with Ephemeral Containers and kubectl debug

Right, so your pod is in a broken state. It’s either crashlooping, stuck in Pending, or just behaving in a way that makes absolutely no sense. Your first instinct is to kubectl exec into it to see what’s going on. But what if the container won’t start? You can’t exec into a container that isn’t running. This is the classic “my car won’t start, and I need to look under the hood but the hood is locked” scenario.

42.4 Exec Into a Running Container for Live Debugging

Right, so your pod is running, but it’s doing something deeply weird. Maybe it’s eating CPU like it’s at an all-you-eat-buffer, or perhaps it’s just… not responding. The logs (kubectl logs) are useless, showing nothing but the digital equivalent of crickets chirping. This is where you stop looking at the autopsy report and start talking to the patient. You need to exec into the running container. Think of kubectl exec as your all-access backstage pass. It lets you open an interactive shell right inside the container, or run any one-off command you can dream up. It’s the difference between reading a log file and actually being there, poking around the filesystem, checking processes, and seeing what the application actually sees. It’s your primary tool for live debugging, and you should be deeply suspicious of anyone who tells you to debug a container without it.

42.3 Debugging with kubectl describe and kubectl logs

Right, so your pod is stuck in Pending or your application is coughing up an error. You’re not going to just stare at kubectl get pods and hope it magically starts working, are you? Of course not. You’re going to ask the cluster what on earth it’s thinking. Your two best friends for this are kubectl describe and kubectl logs. One tells you what the cluster thinks is happening to your pod, and the other tells you what’s actually happening inside it. Let’s break them down.

42.2 Pod Not Starting: Pending, CrashLoopBackOff, ImagePullBackOff

Alright, let’s get our hands dirty. Your pod isn’t starting. It’s just sitting there, mocking you with a status like Pending, CrashLoopBackOff, or ImagePullBackOff. This isn’t a failure; it’s the cluster’s way of sending you a strongly worded letter explaining exactly what you did wrong. Your job is to learn how to read it. First, the golden rule: always start with kubectl describe. Your kubectl get pods output is the headline; kubectl describe is the full investigative report. If you don’t do this first, I can’t help you. It’s like calling a mechanic and saying “my car is broken” but refusing to pop the hood.

42.1 Systematic Troubleshooting Methodology

Right, let’s get this sorted. You’re staring at a CrashLoopBackOff or some other Kubernetes-induced hieroglyphic, and the panic is starting to set in. Don’t. The single biggest mistake you can make is just frantically running kubectl describe on random things, hoping for a clue. That’s like trying to fix a car engine by randomly tapping components with a hammer. You might get lucky, but you’ll probably just make it worse.

— joke —

...