15.2 DNS Records for Services: <service>.<namespace>.svc.cluster.local
Right, let’s talk about the magic trick that makes your Pods find each other without you having to play network detective. It’s the <service>.<namespace>.svc.cluster.local incantation. This isn’t just some random string of jargon; it’s the fully qualified domain name (FQDN) for every Service you create, and it’s the linchpin of service discovery inside your cluster.
Think of it like this: a Service is a stable IP address and a logical grouping of Pods. But IP addresses are for computers; we humans (and our applications) prefer names. The Kubernetes DNS system (usually CoreDNS) is the phone book that maps the friendly name of your Service to that stable IP. The .cluster.local part is the default cluster domain, but it’s configurable if you’re feeling fancy and need to change it. The key thing to understand is that this FQDN is always available, for every Service, the moment the Service is created. You don’t have to configure a thing.
How CoreDNS Makes It Happen
When you create a Service named database in the namespace prod, the kube-controller-manager assigns it a virtual IP (a “cluster IP”). Almost instantly, the kube-dns service (the supervisor for the CoreDNS Pods) gets a watch event for this new Service. CoreDNS then adds a record to its in-memory database. Now, when a Pod tries to resolve database.prod.svc.cluster.local, the request hits CoreDNS, which returns the cluster IP of your database Service.
You can see this in action with a simple dig command from inside a Pod. Let’s create a quick Pod to test it:
kubectl run dns-tester --image=busybox:1.35 --rm -it --restart=Never -- nslookup database.prod.svc.cluster.local
If that Service exists, you’ll get a beautiful, non-authoritative answer showing its IP. The --rm -it --restart=Never flags are our little way of creating a temporary Pod for a single command and then throwing it away, which is perfect for quick tests.
The Beauty of Namespace-Relative Resolution
Here’s the best part: you almost never have to use the full, verbose FQDN. Your Pod’s DNS resolver is configured to search several domains automatically. The search list includes <your-namespace>.svc.cluster.local, svc.cluster.local, and finally cluster.local. This means from within the same namespace (prod), your application can simply use database and it will resolve correctly. Need to talk to a Service in another namespace? Use database.qa and it will expand to database.qa.svc.cluster.local. This is a godsend for application configuration. You can often just set a config value to database and it just works, whether you’re testing in dev or running in prod.
The Subtle Pitfalls (Because Nothing’s Perfect)
It’s not all rainbows. The most common “it’s not working!” moment is when you forget that the DNS resolution is tied to the Service, not the Pods directly. A Service selector must match Pod labels for the Endpoints to be populated. No matching Pods? Your Service has an IP but it’s a road to nowhere. Check with kubectl get endpoints <service-name>.
The other, more insidious issue is the 10-second DNS cooldown. This one bites everyone. Some libraries (notably glibc) have a positive response caching mechanism. If your app resolves a name successfully, it might cache that positive result for a TTL… but also for a minimum of 10 seconds, ignoring the actual TTL from the DNS server (which is usually tiny, like 5 seconds). Why? Because some designers decades ago thought this was a good idea for reducing load. It’s not. It means that during a rolling update or a failover, your app might stubbornly try to talk to a dead Pod for up to 10 seconds after the DNS record has changed. The solution? Use a DNS client that respects TTLs (like those in Go, Java, or Node.js), or use nscd in your container to negative cache, or just accept the delay. It’s a classic case of the system working perfectly, just not the way you want it to.
When to Use the Full FQDN
So when should you use the mouthful of a full name? Primarily when you need to be absolutely, positively sure there’s no ambiguity. In complex multi-cluster or hybrid environments where you might have overlapping network domains, or when configuring systems that live outside the Kubernetes-native DNS resolver. For 99% of in-cluster communication, the short name is not just acceptable; it’s the preferred, maintainable way to do it.