Right, let’s talk about your cluster’s DNS settings, specifically the two things that cause 90% of “but I can’t resolve that!” headaches: ndots and search domains. You’ve probably run a pod, tried to curl another service, and gotten a timeout, only to realize you needed to use the fully qualified name. I’ve been there. It feels dumb. Let’s demystify why it happens.

The core issue is that we humans are lazy. We want to say curl api instead of curl api.production.svc.cluster.local. Kubernetes, in its infinite wisdom, tries to help with this by providing search domains. Your Pod’s /etc/resolv.conf isn’t just a list of nameservers; it’s a recipe for how to try and find a name.

Here’s what a typical /etc/resolv.conf from a Pod looks like. Go ahead, run cat /etc/resolv.conf in one of yours.

nameserver 10.96.0.10
search production.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

See those last two lines? That’s the whole game. Let’s break them down.

How Search Domains Actually Work

That search line is a list of domains the resolver will append to your query if it doesn’t find a match right away. The process isn’t “try the short name, then give up.” It’s “try the short name, and if it fails, try the short name with every single domain on this list, in order, until something sticks.”

So, when you run nslookup api from inside your Pod, the resolver actually performs four queries behind the scenes:

  1. api. (the “absolute” query, because it has a dot)
  2. api.production.svc.cluster.local.
  3. api.svc.cluster.local.
  4. api.cluster.local.

It does this because of the next, more important option: ndots.

The Almighty ndots Threshold

This is the real star of the show, and the cause of most pain. The ndots:5 option sets a rule: “If a query has fewer than 5 dots in it, go through the search domain list first before trying it as an absolute name.”

Read that again. It’s counterintuitive. We think “few dots means it’s probably a short name.” The resolver thinks “few dots means I should definitely try to qualify it with my search paths first.”

This makes perfect sense for intra-cluster communication. The service api.production has only one dot. So curl api.production has a query (api.production) with one dot. Since 1 < 5, the resolver will first try:

  1. api.production.production.svc.cluster.local. (nope)
  2. api.production.svc.cluster.local. (nope)
  3. api.production.cluster.local. (nope)

…and then it will finally try the name you actually asked for: api.production.. This is, to put it technically, bonkers. It adds significant latency (all those misses have to time out) and is almost never what you want for a name that already has a dot in it.

When It All Goes Horribly Wrong

The classic pitfall is trying to call an external service. Imagine you need to call my-api.azurewebsites.net from your Pod.

You, being smart, write curl my-api.azurewebsites.net. That string has three dots. The resolver checks: 3 < 5. So it goes on its magical search domain journey first:

  1. my-api.azurewebsites.net.production.svc.cluster.local. (NXDOMAIN)
  2. my-api.azurewebsites.net.svc.cluster.local. (NXDOMAIN)
  3. my-api.azurewebsites.net.cluster.local. (NXDOMAIN)

After all these fail and time out, it finally resolves the actual name, my-api.azurewebsites.net.. You’ve just added hundreds of milliseconds of unnecessary latency to every single external API call. Congratulations, you’ve now inherited a performance problem that’s a nightmare to debug.

Taking Control: The Solutions

You have two main weapons here. The first is to just use fully qualified domain names (FQDNs). If you end your name with a dot, it becomes an “absolute” query and bypasses the search list entirely. This is the simplest fix.

# The trailing dot makes it absolute. No search, no fuss.
curl my-api.azurewebsites.net.

The second, more permanent solution is to tune ndots in your Pod’s dnsConfig. Setting it to a lower value (like 2 or 3) is common. For Pods that primarily call external services, you might even set it to 1, meaning any query with a dot in it will be tried as an absolute name first.

apiVersion: v1
kind: Pod
metadata:
  name: sensible-pod
spec:
  containers:
  - name: app
    image: nginx
  dnsConfig:
    options:
      - name: ndots
        value: "2"

This is a classic trade-off. A lower ndots value makes external calls faster but means you must use more qualified names for internal services (e.g., api.production.svc.cluster.local instead of just api.production). Choose the value that matches your application’s traffic pattern. There’s no one right answer, but the default of 5 is almost certainly wrong for any Pod that talks to the outside world. It’s a default that made sense for a world of only internal services, and the world has moved on. Now you know how to fix it.