42.9 Kubernetes Events: Using kubectl get events Effectively

Right, let’s talk about events. If your cluster is a crime scene (and it often feels like one), kubectl get events is your first and best witness. It’s the system’s gossip column, a running log of the “who, what, and when” behind almost every state change. Ignore it, and you’re troubleshooting blind. Rely on it too much, and you’ll drown in a firehose of mostly-irrelevant data. Let’s learn how to drink from that firehose effectively.

42.8 Networking Debugging: DNS, Service, and Network Policy Issues

Alright, let’s get our hands dirty. Networking in Kubernetes is where the rubber meets the road, and where things often go spectacularly, head-scratchingly wrong. It’s a complex beast, but we can tame it by breaking it down into its core components: DNS, Services, and Network Policies. Forget the marketing fluff; we’re going to talk about what actually happens on the wire. The First Command: nslookup is Your Best Friend When a pod can’t talk to another pod via its service name, your very first move shouldn’t be to panic. It should be to drop into a shell on a pod and run nslookup. This humble tool will tell you if CoreDNS (or whatever DNS server you’re running) is even responding and if it can resolve the service name to a ClusterIP.

42.7 Control Plane Failures: API Server, etcd, Scheduler

Right, so your cluster has gone sideways. The apps are down, kubectl commands are timing out, and that little voice in your head is whispering, “Was it something I did?” Probably. But more likely, it’s the control plane throwing a tantrum. This isn’t your application code; this is the brain of your entire operation having a stroke. We need to triage the patient. The control plane’s job is to maintain state. Its entire existence is a constant loop of “observe reality, compare to desired state, reconcile.” When it fails, that loop breaks. Your first clue is almost always the kubectl command hanging or spitting out a beautiful, utterly useless The connection to the server <server-name:port> was refused - did you specify the right host or port?. Don’t panic. This just means the API server, the front door to everything, is closed for business.

42.6 Node NotReady: Common Causes and Remediation

Alright, let’s talk about a Node going into NotReady state. It’s Kubernetes’ way of telling you, “Hey, I’ve got a problem over here and I can’t schedule any more work on this server.” It’s not being lazy; it’s being honest. Your job is to figure out why. Think of the Kubelet on each node as a harried middle manager. Its sole job is to constantly report back to the Control Plane (Head Office) that its node (retail store) is open for business and has shelf space. The Node object is that status report. When the Kubelet stops sending good reports—or any reports at all—the Control Plane, after a few minutes of radio silence, marks the node as NotReady. It’s a safety mechanism. It’d rather stop sending you customers than send them to a store that might be on fire.

42.5 Debugging with Ephemeral Containers and kubectl debug

Right, so your pod is in a broken state. It’s either crashlooping, stuck in Pending, or just behaving in a way that makes absolutely no sense. Your first instinct is to kubectl exec into it to see what’s going on. But what if the container won’t start? You can’t exec into a container that isn’t running. This is the classic “my car won’t start, and I need to look under the hood but the hood is locked” scenario.

42.4 Exec Into a Running Container for Live Debugging

Right, so your pod is running, but it’s doing something deeply weird. Maybe it’s eating CPU like it’s at an all-you-eat-buffer, or perhaps it’s just… not responding. The logs (kubectl logs) are useless, showing nothing but the digital equivalent of crickets chirping. This is where you stop looking at the autopsy report and start talking to the patient. You need to exec into the running container. Think of kubectl exec as your all-access backstage pass. It lets you open an interactive shell right inside the container, or run any one-off command you can dream up. It’s the difference between reading a log file and actually being there, poking around the filesystem, checking processes, and seeing what the application actually sees. It’s your primary tool for live debugging, and you should be deeply suspicious of anyone who tells you to debug a container without it.

42.3 Debugging with kubectl describe and kubectl logs

Right, so your pod is stuck in Pending or your application is coughing up an error. You’re not going to just stare at kubectl get pods and hope it magically starts working, are you? Of course not. You’re going to ask the cluster what on earth it’s thinking. Your two best friends for this are kubectl describe and kubectl logs. One tells you what the cluster thinks is happening to your pod, and the other tells you what’s actually happening inside it. Let’s break them down.

42.2 Pod Not Starting: Pending, CrashLoopBackOff, ImagePullBackOff

Alright, let’s get our hands dirty. Your pod isn’t starting. It’s just sitting there, mocking you with a status like Pending, CrashLoopBackOff, or ImagePullBackOff. This isn’t a failure; it’s the cluster’s way of sending you a strongly worded letter explaining exactly what you did wrong. Your job is to learn how to read it. First, the golden rule: always start with kubectl describe. Your kubectl get pods output is the headline; kubectl describe is the full investigative report. If you don’t do this first, I can’t help you. It’s like calling a mechanic and saying “my car is broken” but refusing to pop the hood.

42.1 Systematic Troubleshooting Methodology

Right, let’s get this sorted. You’re staring at a CrashLoopBackOff or some other Kubernetes-induced hieroglyphic, and the panic is starting to set in. Don’t. The single biggest mistake you can make is just frantically running kubectl describe on random things, hoping for a clue. That’s like trying to fix a car engine by randomly tapping components with a hammer. You might get lucky, but you’ll probably just make it worse.

43.7 Centralizing Logs: rsyslog to a SIEM or Log Aggregation Platform

Right, so you’ve got logs spewing out of every server like a firehose. You could try to read them by SSHing into each box and tailing files until your eyes bleed, but let’s be honest: that’s a special kind of masochism reserved for people who also enjoy assembling IKEA furniture without the instructions. The only sane way to make sense of this chaos is to get all those logs off the individual machines and into a central system—a SIEM, an Elasticsearch cluster, a cloud-based log aggregator, whatever. You need a single pane of glass, even if that glass is sometimes a little dirty.

43.6 journalctl Filters: -u, --since, --until, -p, -f

Right, so journalctl is your new best friend and your worst critic, all wrapped into one. It’s the primary tool for reading the structured, indexed journal that systemd creates, and if you’re just running it naked, you’re doing it wrong. You’ll be buried in a firehose of data, from the very first boot message to the kernel’s latest hiccup. The power, and the point, is in the filters. Let’s talk about the ones you’ll use every single day.

43.5 journald: Persistent vs Volatile Journal Storage

Right, let’s talk about journald’s split personality when it comes to storage. This isn’t just some academic distinction; it dictates whether your precious logs vanish into the ether after a reboot or stick around for you to autopsy later. It’s the difference between a detective having a crime scene and just having a vague memory of what might have happened. By default, most distros ship with the journal stored only in memory (/run/log/journal/). This is the “volatile” storage. It’s fast, it doesn’t wear out your SSD with a million tiny writes, and it’s perfect for… well, for situations where you don’t care about logs after a reboot. I can’t think of many of those situations, but they must exist. The moment you shut down, poof—the journal is gone. It’s like a court reporter with amnesia.

43.4 logrotate: Rotating, Compressing, and Pruning Log Files

Right, let’s talk about logrotate. You’re here because your disk space is screaming for mercy, or you’re just smart enough to know it will be soon. Log files are the digital equivalent of hoarding old newspapers; they just keep piling up until you can’t open the front door. logrotate is your friendly, automated cleanup crew. It’s not the flashiest tool, but it’s one of the most reliable workhorses in your sysadmin toolkit. It rotates, compresses, mails, and deletes log files according to rules you set. And it’s probably already installed on your system, silently doing its job for core services.

43.3 /var/log Directory: Common Log Files and Their Contents

Right, let’s talk about /var/log. This is where your system’s diary lives, and like any good diary, it’s full of secrets, drama, and a meticulous record of everything that’s ever gone wrong. If your system starts acting weird, this is your first crime scene. Don’t just glance at it; learn to read it like a detective. The Lay of the Land First, a quick tour. The /var/log directory is the designated dumping ground, by convention and by the Filesystem Hierarchy Standard (FHS), for all log files. This is brilliant because it means you always know where to start looking. You’ll find everything from the kernel’s deepest mutterings (kern.log) to a user failing to log in for the tenth time (auth.log). The structure is mostly flat, which is both a blessing (simplicity) and a curse (a potential mess of hundreds of files). Some applications, being the special snowflakes they are, create their own subdirectories like /var/log/apt or /var/log/nginx, which is actually a decent practice.

43.2 rsyslog: Configuration, Filters, and Forwarding to Remote Hosts

Right, so you’ve got logs. Lots of them. They’re spewing out of your systems like confetti from a cannon, and right now they’re probably all just piling up in /var/log/syslog, which is about as useful as a screen door on a submarine. We need to bring order to this chaos, and rsyslog is our tool of choice. It’s the venerable workhorse of Linux logging, and it’s powerful enough to make you weep with joy or frustration, sometimes simultaneously. Forget the basic syslog; rsyslog is its modern, plugin-driven, über-powered descendant. Let’s bend it to our will.

43.1 syslog and the syslog Protocol: Facilities and Severities

Alright, let’s talk about syslog. You’ve seen those cryptic messages scrolling through your system logs, right? They’re not just random text vomit; they’re actually structured messages following one of the oldest and most widely adopted protocols in computing. It’s the duct tape and baling wire of logging—it’s everywhere, it’s ugly, but it gets the job done. The protocol itself, defined in RFC 5424, is a standard for message logging that allows different devices and software to send event notification messages across an IP network. But we need to start with the classic, original format to understand its soul.

— joke —

...