16.2 CNI: How Kubernetes Delegates Networking
Right, so you’ve got a Kubernetes cluster humming along. Pods are scheduled, the API server is doing its thing, and then… a new Pod lands on a node. It has an IP address, but how did it get there? The kubelet didn’t fuss with any ip commands. That’s because it outsourced the entire messy business of networking to a bunch of smaller, specialized programs through a standard contract called the Container Network Interface (CNI).
Think of the kubelet as a general contractor. It knows a pod needs a network, but it doesn’t own a single wrench or cable. Instead, it looks at a pre-configured list of subcontractors (CNI plugins) and hires them for the job. This is brilliant because it means Kubernetes itself doesn’t need to care about the specifics of whether you’re using Calico, Flannel, Cilium, or something that hasn’t been invented yet. Its job is to delegate, and it does so with a shockingly simple, plugin-based model.
The CNI Contract: It’s Just JSON and Executables
The contract between the kubelet and a CNI plugin is dead simple, which is why it works so well. When a container (well, a Pod’s network namespace) needs to be created or destroyed, the kubelet executes a configured CNI plugin binary. It passes information to the plugin via environment variables and, crucially, a JSON configuration file via stdin.
That JSON config tells the plugin everything it needs to know: “Hey, here’s the network namespace path for the new Pod, its container ID, and the name of the network we want you to plug it into.” The plugin’s job is to take that info, do whatever black magic it does (adding interfaces, setting up routes, programming ACLs), and return a JSON result that includes the assigned IP address. The kubelet then just picks up that IP and stamps it on the Pod’s status.podIP field. Neat, right?
Here’s a stripped-down example of what that JSON config might look like on a node using the bridge plugin:
{
"cniVersion": "0.4.0",
"name": "mynet",
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "10.244.1.0/24",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
}
This file is the recipe. It says: “For the network named ‘mynet’, use the ‘bridge’ plugin. Please create/use a bridge called cni0, make it the gateway for the container, and handle masquerading. For IPs, use the ‘host-local’ IPAM plugin to hand out addresses from the 10.244.1.0/24 pool.”
The Plugin Chain: Because One Plugin Is Never Enough
Here’s where the designers got clever. You don’t have to use just one plugin. You can chain them. A CNI “network” is actually a list of plugins that are executed in order. This is how you get meta-plugins that don’t actually configure a network themselves but do something else, like managing bandwidth (bandwidth) or setting up port forwarding (portmap).
Your CNI config directory (/etc/cni/net.d/ on the node) might have a conflist file that defines this chain:
{
"cniVersion": "0.4.0",
"name": "mynet",
"plugins": [
{
"type": "bridge",
"bridge": "cni0",
"ipam": { ... }
},
{
"type": "portmap",
"capabilities": {"portMappings": true}
},
{
"type": "bandwidth",
"capabilities": {"bandwidth": true}
}
]
}
The bridge plugin runs first to set up the core networking. Then the portmap plugin comes in to handle the hostPort mappings you might define in your Pod spec. Finally, the bandwidth plugin can apply rate limiting. Each plugin in the chain gets the same JSON data, modifies it, and passes it on to the next. It’s a surprisingly elegant and powerful composition model.
Common Pitfalls and The Horror of CNI Version Skew
Now, let’s talk about the rough edges, because oh boy, are there some. The biggest headache in practice is CNI version skew. The CNI specification has versions (like 0.3.1, 0.4.0, 1.0.0). A plugin compiled against an older version of the CNI library might not return a JSON response in the format a newer kubelet expects, or vice-versa. The error messages you get from this are famously cryptic. You’ll see failed to set up pod "foo" network: invalid version "" and you’ll want to scream. The fix is almost always ensuring your plugin binaries and your configuration files are all declaring and using the same cniVersion.
Another classic “it works on my machine” issue is forgetting the CNI binaries entirely. If you kubeadm init a cluster but forget to install any CNI plugin, your CoreDNS pods will just sit there in Pending. The kubelet has a subcontractor list, but no subcontractors to call. It’s like building a house with no plumbers or electricians on speed dial. Always check kubectl get nodes -o wide; if the node’s internal IP is missing, your CNI is almost certainly borked.
The best practice? Let a DaemonSet manage your CNI plugins. Modern CNI providers like Calico and Cilium run their plugins within pods on each node, managed by a DaemonSet. This is infinitely better than manually copying binaries to /opt/cni/bin on every node. It means your networking is declared and version-controlled right alongside the rest of your cluster configuration. You’re letting the system manage the system, which is how it should be.