15.5 Debugging DNS: nslookup and dig Inside a Pod
Alright, let’s get our hands dirty. The theory of Kubernetes DNS is all well and good until you’re staring at a misbehaving service and your application is screaming about an UnknownHostException. This is where you stop guessing and start poking the system directly. The most direct way to do this is by running diagnostic tools like nslookup and dig from inside a Pod. Why inside? Because that’s the exact environment your application is running in. The view from your laptop is a lie; the view from the Pod is the truth.
The Quick-and-Dirty Debug Pod
You don’t install dig on your production application pod. That’s like performing engine surgery on a race car mid-lap. Instead, you launch a dedicated throwaway pod into the same namespace. The simplest way is to run a busybox pod. But a word of warning: the default busybox image is a minimalist nightmare that doesn’t include nslookup or dig. It’s a classic “gotcha.” You need the busybox:1.36 image or later, which includes nslookup. For dig, you’re better off with a more fully-featured image.
Let’s create a pod that actually has the tools we need. This is my go-to:
apiVersion: v1
kind: Pod
metadata:
name: dns-debugger
namespace: your-app-namespace # <- CHANGE THIS!
spec:
containers:
- name: debugger
image: radial/busyboxplus:curl
command: [ "/bin/sh", "-c", "sleep 3600" ] # Just hang around for an hour
restartPolicy: Never
Create it with kubectl apply -f debug-pod.yaml. The radial/busyboxplus image is brilliant because it includes curl, dig, nslookup, and a bunch of other network troubleshooting tools. It’s the Swiss Army knife you need.
Using nslookup: The Blunt Instrument
nslookup is the old-school workhorse. It’s not always perfectly formatted, but it gets the job done and is almost always available. Its primary use here is a basic sanity check: “Can I resolve this name at all?”
First, exec into your shiny new debug pod:
kubectl exec -it dns-debugger -n your-app-namespace -- /bin/sh
Now, let’s try to resolve a Kubernetes service. The fully qualified domain name (FQDN) is the key. You can’t just guess.
nslookup kubernetes.default.svc.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default.svc.cluster.local
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
Beautiful. This tells us several things instantly:
- It worked. The name resolved.
- Our DNS server is
10.96.0.10. This is the default ClusterIP for thekube-dnsservice. If you see something else here, it’s a clue that your DNS setup might be custom. - The name resolved to the ClusterIP of the
kubernetesservice (10.96.0.1).
Now, let’s break it on purpose to see what failure looks like. Let’s query something that doesn’t exist.
nslookup does-not-exist.default.svc.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'does-not-exist.default.svc.cluster.local'
This is a clear, unambiguous failure. The DNS server responded, but it had no record for that name. If your nslookup command hangs forever or fails to even contact the server, you’ve got bigger problems, like network policy blocking DNS traffic or the CoreDNS pods being down.
Using dig: The Scalpel
While nslookup tells you the “what,” dig tells you the “how,” “when,” and “why.” It gives you the full DNS response, which is invaluable for debugging trickier issues. The output is more verbose but far more informative.
dig kubernetes.default.svc.cluster.local
; <<>> DiG 9.16.33 <<>> kubernetes.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39568
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;kubernetes.default.svc.cluster.local. IN A
;; ANSWER SECTION:
kubernetes.default.svc.cluster.local. 30 IN A 10.96.0.1
;; Query time: 2 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Tue Oct 17 19:12:45 UTC 2023
;; MSG SIZE rcvd: 117
Let’s break down the gold in this output:
status: NOERROR: This is good. Any other status (likeNXDOMAINorSERVFAIL) indicates a problem.flags: aa: The “Authoritative Answer” flag is set. This is CoreDNS telling you, “I am the ultimate authority for thiscluster.localdomain; this isn’t a cached answer from upstream.”ANSWER SECTION: This is the prize. It shows the record (Arecord) and its IP.Query time: 2 msec: Confirms the DNS server is responding quickly. If this number is huge, you have latency issues.SERVER: 10.96.0.10#53: Confirms which DNS server we’re talking to.
The Most Common Pitfall: It’s Not Just ‘my-service’
This is the mistake everyone makes once. You try to resolve my-service from your app and it fails. So you jump into a pod and run nslookup my-service and it also fails. You’re ready to tear your hair out. The reason is almost always that you forgot the namespace.
Kubernetes DNS is built on FQDNs. The shorthand my-service only works if the client (your app) is in the same namespace as the service. If your pod is in namespace app and the service is in namespace database, nslookup database-service will fail miserably. You must use the full name:
# This will likely fail if you're not in the 'database' namespace
nslookup database-service
# This will work from ANY namespace
nslookup database-service.database.svc.cluster.local
Always, always use the full FQDN when debugging. It removes a huge variable from the equation. If the FQDN works but the short name doesn’t, you’ve just identified the problem: your application needs to be configured to use the full name, or it needs to be in the same namespace as the service it’s trying to call. It’s not a DNS bug; it’s a configuration error. And now, thanks to nslookup and dig, you know exactly what it is.