44.5 Node Local DNSCache: Eliminating DNS Bottlenecks
Right, let’s talk about one of the most common, yet most insidious, performance killers in Kubernetes: DNS latency. You’ve probably seen it. Your application isn’t CPU-bound, it’s not memory-bound, but it just feels… sluggish. A request comes in, and it spends half its life just trying to figure out where to go. That’s DNS for you. It’s the phone book of the internet, and in a dynamic environment like K8s, you’re looking up numbers constantly. Every service discovery call, every database connection string resolution, every call to an external API—it all goes through the cluster’s DNS resolver. And by default, that means a trip to kube-dns/CoreDNS on every single pod. This creates a massive bottleneck at the cluster level, a single point of contention for every microservice chatty enough to rival a royal court.
The brilliant (and frankly, long-overdue) solution to this is the NodeLocal DNSCache. The name is a mouthful, but the concept is beautifully simple: instead of every pod on a node sending all their DNS queries over the network to a central CoreDNS pod, we run a tiny DNS caching daemon on each node. Pods on that node then talk to this local cache. The cache daemon is a good citizen—if it knows the answer (a cache hit), it returns it instantly. If it doesn’t (a cache miss), it forwards the query upstream to the cluster’s CoreDNS service, gets the answer, caches it for next time, and then tells you. The win is monumental. We’ve just taken 90% of the DNS chatter that was crisscrossing our network and reduced it to local IPC on the node. The network team will send you a fruit basket.
Why This Isn’t Just a “Nice-to-Have”
The default setup is a textbook case of a bad fan-out pattern. Imagine 50 pods on a node, all needing to resolve database-primary.prod.svc.cluster.local. That’s 50 nearly identical UDP packets leaving the node, being routed, hitting the CoreDNS pod, and 50 responses coming back. It’s a ridiculous amount of wasted overhead for the exact same query. The NodeLocal cache turns this into 1 request to CoreDNS, and then 49 subsequent requests are answered from memory in microseconds. The reduction in DNS query latency (especially tail latency) and the load taken off CoreDNS is staggering. It’s the difference between everyone in an office calling the central library for the same fact versus one person calling and then just yelling the answer across the room.
Deploying It: The Nitty-Gritty
The standard manifest from the Kubernetes project is a solid starting point. You’ll apply a DaemonSet, a Service, and some config magic. Let’s get our hands dirty.
First, grab the latest manifest. This is one of those times where pulling the official one is wise, as it evolves.
curl -s https://github.com/kubernetes/kubernetes/raw/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml > nodelocaldns.yaml
Now, open that file. Don’t just blindly kubectl apply it. Be a professional. We need to configure two critical things:
The Upstream DNS Server: The cache needs to know where to send misses. We have to tell it the cluster’s DNS service IP. Find the line that sets
__PILLAR__CLUSTER__DNS__and replace it with your cluster’s DNS service IP. You can get this easily:kubectl get svc kube-dns -n kube-system -o jsonpath='{.spec.clusterIP}'In the YAML, find the
-argssection for thenode-cachecontainer and replace__PILLAR__CLUSTER__DNS__with that IP.The Local Listen IP: The DaemonSet patches the node’s
kubeletto point pods to the cache. It does this by changing the--cluster-dnsflag on the kubelet to point to the cache’s IP. The default IP chosen in the manifest is169.254.20.10(a link-local address). This is fine, but you must ensure this IP doesn’t conflict with anything else in your CNI setup. It almost certainly won’t, but it’s your job to check.
Once you’ve made those edits, deploy it:
kubectl apply -f nodelocaldns.yaml
The Dark Arts of Pod Configuration
Here’s the clever bit. The DaemonSet deploys a pod on every node, but how do we make your pods use it? The deployment does two things:
- It patches the kubelet configuration on the node (via a initContainer) to change its
--cluster-dnsflag to the local cache IP (169.254.20.10). Any pod created after this patch, that uses the defaultdnsPolicy: ClusterFirst, will automatically use the local cache! This is the magic. - It creates a Service for the old CoreDNS IP, so the cache can forward misses upstream without creating a routing loop.
For existing pods, you’re out of luck—they need to be restarted to pick up the new kubelet setting.
Pitfalls and “Oh, Crap” Moments
- The InitContainer Requires Privileges: The deployment uses an initContainer to write a config file and restart the kubelet. This requires privileged access. Look at the manifest; it’s
privileged: true. This makes some security folks understandably nervous. You can do this modification manually or via your node provisioning tool (Terraform, Ansible, etc.) and then deploy a less privileged DaemonSet, but the automated way is the standard. - StubDomains and Upstreams: If you have a complex DNS setup with
stubDomainsorupstreamNameserversin yourCorefile, you need to replicate that configuration in the NodeLocal DNSCache ConfigMap. The cache becomes the primary DNS for the pods, so it needs to know all the special routing rules. This is a common “it-deployed-but-my-queries-are-failing” gotcha. - Monitoring: You’re now running a critical DNS service on every node. Monitor it! Scrape metrics from the
node-cachecontainer’s port9253(the metrics port) just like you would for CoreDNS. Track cache hits/misses, latency, and errors. If the cache hit rate is low, your application’s DNS patterns might be too unique to benefit much, but that’s rare.
The bottom line? NodeLocal DNSCache is one of the highest-value, lowest-risk performance improvements you can make to a cluster of any real size. It’s Kubernetes admitting, “Yeah, the default way is kinda dumb, here’s a fix.” Deploy it, tune it, and watch those pointless network hops vanish. Your applications will thank you by actually spending their time doing work.