7.8 Deployment Status Conditions and Health Checks

Right, so your Deployment is up and running. You’ve run kubectl get deployments and it proudly reports 3/3 pods are available. Fantastic. You high-five the intern and call it a day. But what if I told you that “Available” is a filthy, rotten liar? Well, maybe not a liar, but it’s certainly not telling you the whole story. It’s the highlight reel, not the grueling practice session where things actually go wrong.

The real truth serum for your Deployments is buried in a little field called status.conditions. This is where Kubernetes, like a stressed-out project manager, logs every significant event in the lifecycle of your rollout. Ignoring this is like trying to fix a car with the hood welded shut.

The Conditions That Actually Matter

A Deployment’s status is a collection of conditions. Think of them as status updates from different sub-teams. The ones you need to care about are:

Available: This team reports if the Deployment has enough Pods running (meeting the minReadySeconds criteria) to be considered… available. It’s the one you usually see first.
Progressing: This team is in charge of the rollout itself. They report if the ReplicaSet is actively creating new Pods or scaling down old ones. If something halts the rollout, this condition will tell you.
ReplicaFailure: This is the short, panicked message you get if the API server is having a heart attack and can’t even talk to the underlying ReplicaSet. You’ll see this rarely, but when you do, it’s a five-alarm fire for your control plane.

To see the unvarnished truth, you run this:

kubectl describe deployment my-fancy-app

Scroll down past the Pod templates and replica counts until you hit the Conditions section. It looks something like this:

Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable

“True” is good. “False” is your cue to start investigating. The Reason and Message fields are the crucial clues. Is it Progressing False because of a DeadlineExceeded? Or is Available False because the new Pods are crashing faster than Kubernetes can schedule them?

Why Your “Healthy” Pods Might Be Zombies

Here’s the kicker: the Deployment controller’s main job is to count Pods. It sees a Pod that is Running and goes, “Yep, that’s one for me!” But Running just means the container runtime has successfully started the process. It says absolutely nothing about whether your application inside the container has finished booting, connected to its database, or is ready to serve traffic.

This is the most common pitfall I see. You push a new image, the rollout seems to succeed, but you get a flood of 503 errors. Why? Because the Deployment thought the new Pod was ready the second the container started, so it immediately sent user traffic to it and terminated the old, healthy Pods. Your new app was still waiting for a JDBC connection pool to initialize.

You fix this by not letting the Deployment controller be so gullible. You use probes.

Liveness, Readiness, and Startup Probes: An Intervention

Probes are how your application tells Kubernetes the actual truth about its state. They are health checks.

Readiness Probe: This is the most important one. It answers, “Should this Pod receive traffic?” If it fails, the Pod is yanked out of the Service’s load balancer. It does not kill the Pod. Use this for slow-starting applications. A Pod might be Running but not ready.
Liveness Probe: This answers, “Is this Pod alive, or should I kill it and let a new one take its place?” If this fails, Kubernetes murders the Pod with extreme prejudice. Be careful with this. If your liveness probe is too sensitive, or points to an endpoint that hangs under load, you can create a murder-suicide loop where Kubernetes kills your app just for being busy.
Startup Probe: This is the new kid on the block, and it solves a specific problem. It’s designed for really slow-starting containers (e.g., legacy Java apps that take 5 minutes to start). It disables the other probes until this one succeeds. This prevents the liveness probe from killing your app before it’s even had a chance to boot.

Here’s how you define them in your Pod template. This is non-negotiable for any serious deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: server
        image: my-app:1.2.0
        ports:
        - containerPort: 8080
        # The most basic HTTP readiness probe
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 5  # Wait 5 seconds after startup before first probe
          periodSeconds: 10        # Check every 10 seconds
        # A more conservative liveness probe
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 60   # Give the app a full minute to get ready
          periodSeconds: 30
        # For a container that takes forever to start
        startupProbe:
          httpGet:
            path: /health/started
            port: 8080
          failureThreshold: 30  # Allow up to 30 checks...
          periodSeconds: 10      # ...every 10 seconds (300 seconds total)

Best Practices: Don’t Be a Statistic

Always Use a Readiness Probe: This is not optional. It’s the primary mechanism for ensuring zero-downtime deployments.
Be Conservative with Liveness Probes: Only use them if you are certain your application can get into a stuck state that only a restart can fix. Make its failure threshold higher than the readiness probe’s. You want to be dead, not just sick.
Use Startup Probes for Slow Containers: They elegantly solve the problem of “my app gets killed while starting up.”
Make Your Health Checks Cheap and Separate: Your /health endpoint should not trigger a cascade of database queries or API calls. Check internal state, not downstream dependencies. If your app is alive but the database is down, the Pod should still be “live” and “ready”—killing it won’t fix the database, it’ll just create more failing Pods.
Mind the minReadySeconds: This field in the Deployment spec forces Kubernetes to wait a minimum number of seconds after a Pod becomes ready before considering it “available.” It’s a clunky but effective way to add a safety margin to catch flaky startups.

The goal is to make your Deployments boringly reliable. By mastering conditions and probes, you stop guessing and start knowing exactly why your software is behaving the way it is. You move from hoping it works to commanding it to work.