41.1 Kubernetes Version Skew Policy: What Can Be Mismatched
Right, let’s talk about version skew. This isn’t some theoretical guideline dreamed up by a bored architect; it’s a hard-won set of rules that keeps your cluster from having a full-blown existential crisis during an upgrade. Think of it as the rules of engagement for a multi-version, distributed system. You can have different versions of components talking to each other, but only if they agree on the core rules of the conversation. Break those rules, and you get undefined behavior, which is a fancy term for “panic-induced outage at 2 AM.”
The core principle is this: the API server is the grand central station of your cluster. Everything talks to it. Therefore, it gets to be the reference point. We measure the skew of every other component relative to the API server’s version. The general rule is a window of n-2. Your API server is at version 1.28? Your controllers, scheduler, and kubelets can be no older than 1.27, and ideally, you should aim for 1.28.
The Almighty API Server and its Minions
Your kube-apiserver is the undisputed boss. It speaks the current, most-correct dialect of the Kubernetes API. The kube-controller-manager, kube-scheduler, and cloud-controller-manager can be one version behind the API server (n-1). Why? Because they are clients of the API server. They mostly write to it and watch it. The API server is the ultimate source of truth; as long as these components can understand the API responses and formulate valid requests, they’re good. A 1.27 scheduler can perfectly well assign a Pod to a Node by talking to a 1.28 API server.
Let’s get concrete. Here’s how you’d check your current control plane versions using kubectl:
kubectl get nodes -o wide
Look for the control plane node(s). The KERNEL-VERSION column won’t help, but the node name might hint. A better way is to check the component pods directly in the kube-system namespace:
# For a cluster using kubeadm (a common setup)
kubectl get pods -n kube-system -l tier=control-plane -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}{end}'
This might spit out something like:
kube-apiserver-my-node k8s.gcr.io/kube-apiserver:v1.28.4
kube-controller-manager-my-node k8s.gcr.io/kube-controller-manager:v1.27.6
kube-scheduler-my-node k8s.gcr.io/kube-scheduler:v1.28.4
See that? The controller manager is at v1.27.6 while the API server and scheduler are at v1.28.4. This is a perfectly valid n-1 skew. The system is humming along. The moment that controller manager drops to v1.26.x, you’re officially out of support and begging for trouble.
The Kubelet’s Long Leash
The kubelet on each worker node gets a bit more leeway. It can be two versions behind the API server (n-2), but it must never be a version newer than the API server. Let me say that again: a kubelet must never be newer than its API server. This is non-negotiable. A newer kubelet might try to use API features or fields that the older API server simply doesn’t know about yet, leading to rejected requests and failed pods.
So, with a 1.28 API server, your kubelets can be 1.28, 1.27, or 1.26. That’s it. This longer leash is what makes rolling node upgrades possible. You can drain, upgrade, and cordon a few nodes at a time while the rest of the cluster, including the API server, continues to operate.
# Check the kubelet version on a specific node
# (This assumes you have ssh access to the node)
ssh my-worker-node-01 "sudo kubelet --version"
The Kubectl Wildcard
Here’s the fun one. kubectl is a client tool that lives on your laptop, a CI/CD server, or an admin machine. It can be one version newer or older than the API server (n±1). This is because kubectl’s main job is to translate your commands into API requests. It’s pretty good at being backwards compatible. A v1.29 kubectl talking to a v1.28 API server will simply avoid using brand-new v1.29 features in its requests. It knows how to dumb itself down, which is a feature more of us could use.
The Real-World Pitfall: API Deprecations
The skew policy isn’t just about “does it run.” It’s about “does it run without silently breaking.” The most common pitfall involves deprecated APIs. Let’s say the v1.28 API server deprecates the v1beta1 CronJob API. You upgrade your API server to 1.28, but you’re slow to upgrade your controller manager, which is still running 1.27.
Your 1.27 controller manager is still happily using the deprecated v1beta1 API to create CronJobs. The 1.28 API server, trying to be helpful, accepts these requests… for now. But it’s logging deprecation warnings like crazy. You upgrade your controller manager to 1.28 a week later, and it immediately starts using the new v1 API. But you still have old manifests, CI/CD pipelines, or Helm charts that try to submit v1beta1 manifests. Now those requests fail hard. Your upgrades are broken.
The lesson? Use kubectl convert. Before you upgrade, find and fix your deprecated API manifests.
# Find any CronJobs still using the old API
kubectl get cronjobs -A -o jsonpath='{range .items[?(@.apiVersion == "batch/v1beta1")]}{.metadata.namespace}{"/"}{.metadata.name}{"\n"}{end}'
The skew policy gives you a runway, but it doesn’t absolve you from paying attention to what’s happening on that runway. Upgrade your components in the right order (API server first, then everything else) and always, always check for deprecated APIs before you start the upgrade process. Your pager will thank you.