Right, let’s talk about the part everyone says is important but often tries to skip: making sure your precious applications don’t collectively faceplant when you flip the upgrade switch. Skipping this is like skydiving without checking your parachute because you “used it last week and it was fine.” Don’t be that person. The cluster’s new API server might be shinier, but your app is about to speak a language it no longer fully understands.

The core of the problem is API deprecation and change. The lovely folks at Kubernetes are constantly improving the place, which means they occasionally break a few windows and remove the back door. An object definition (apiVersion: apps/v1beta1, I’m looking at you) that worked in 1.20 might be ruthlessly culled by 1.25. Your job is to find these landmines before you’re standing in the middle of the field.

The Preliminaries: Know Your Enemy (and Yourself)

First, you need a target. Don’t just upgrade to “the latest.” Pick a specific version. Check the Kubernetes release notes for your target version. They almost always have a dedicated “deprecation” and “API change” section. Read it. It’s dry, but it’s the cheat sheet to this entire exam.

Next, you need a complete inventory of what you’re actually running. You’d be surprised how many “temporary” manifests from three years ago are now running your most critical service. You can’t test what you don’t know about.

# Get a raw dump of all your current resource manifests. This is gold.
kubectl get all --all-namespaces -o yaml > cluster-dump-before-upgrade.yaml

# More targeted approach: List all API resources in use, which helps pinpoint deprecated APIs
kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get --show-kind --ignore-not-found -A -o yaml > all-resources.yaml

This isn’t just a backup; it’s your test corpus. Now, let’s get to work.

Dry-Run Your Manifests Against the New API Server

This is the single most effective trick in the book. You can run kubectl commands in dry-run mode against your current cluster but with the target API versions. It’s like asking the new API server, “Hey, would you accept this?” without actually changing anything.

# First, find a resource that might be on the chopping block, like an old Ingress
kubectl get ingress my-old-ingress -n my-namespace -o yaml > old-ingress.yaml

# Now, try to apply it with the new API version. The trick is to use server-side apply (or client-side) dry-run.
kubectl apply -f old-ingress.yaml --dry-run=server -o yaml

# If it's truly deprecated, you'll get a glorious and terrifying error:
# error: unable to recognize "old-ingress.yaml": no matches for kind "Ingress" in version "extensions/v1beta1"

Boom. There’s your first victim. Now you know you need to rewrite that manifest to use networking.k8s.io/v1 before you upgrade. Do this for every resource type you found in your inventory.

Automate This with Plow and Kube-no-Trouble

Doing this manually for every resource is for masochists. Be lazy. Use tools.

Kube-no-Trouble (kubent) is a brilliant little tool that scans your cluster for deprecated APIs. It’s fast, simple, and tells you exactly what’s going to break.

# Run its Docker container, giving it your kubeconfig to scan the live cluster
docker run --rm -v ~/.kube/config:/.kube/config ghcr.io/doitintl/kube-no-trouble:latest

# Or, have it scan that manifest dump you made earlier
kubent -f cluster-dump-before-upgrade.yaml

Its output is beautifully clear, telling you the API kind, namespace, name, and—most importantly—the replacement API to use.

Plow is a more comprehensive validation tool. You point it at a new cluster’s kubeconfig and your manifests, and it will run a suite of checks.

# Assuming you have a kubeconfig for your v1.27 test cluster
plow validate -k /path/to/test-cluster-kubeconfig -f ./my-app-manifests/

These tools don’t replace thinking, but they do replace hours of tedious, error-prone manual checking.

The Non-API Stuff: The Real Nightmare

Here’s where it gets fun. Even if your APIs are perfect, your apps might still break. Why? Because the underlying container environment changes. The new host OS (or containerd/docker version) might have a subtly different libc, or a security policy that breaks your janky old container that’s running as root.

Your best defense here is a rigorous canary testing process in a staging environment that mirrors production. Deploy a copy of your application to the new cluster and beat the living daylights out of it with real-world traffic patterns and load tests. Check logs for weird permission errors, connectivity issues, or crashes. This isn’t about the syntax of your YAML anymore; it’s about the semantics of your running code. There’s no shortcut. You just have to test it.

The bottom line is this: Compatibility testing is your one-way ticket to a smooth upgrade versus a 3 a.m. war room call. The tools are good now. Use them. Be the person who confidently says “it’ll work” because you’ve already seen it work.