11.6 VPA Modes: Off, Initial, Auto

Alright, let’s talk about VPA modes. This is where you decide just how much authority you’re willing to hand over to this particular robot butler. You’ve installed VPA, you’ve defined a VerticalPodAutoscaler resource, and now you have to choose its updateMode. You’ve got three options: Off, Initial, and Auto. Picking the right one is the difference between getting helpful advice and handing your cluster the keys to the kingdom with a blindfold on.

The Three Modes: A Spectrum of Trust

Think of these modes on a spectrum of how much operational risk you’re comfortable with. Off is safe, Auto is… bold, and Initial is the sensible middle ground that most teams should probably start with.

Off Mode: Recommendations Only This is the safe, read-only mode. The VPA recommender does its job brilliantly: it analyzes the actual resource usage of your pods, looks at its magic memory ball, and calculates perfect resource requests and limits for you. And then it… does absolutely nothing with that information. It just writes the recommendation into the VPA object’s status for you to look at.

You use this when you want the brilliant advice but aren’t ready for automation. It’s like having a brilliant financial advisor who sends you a detailed plan, but you still have to log into your bank website yourself to actually buy the stocks. It’s perfect for:

Initial evaluation of VPA’s recommendations before letting it make changes.
Production environments where any unscheduled pod restart is a non-starter.
Just getting a baseline of what your resources should be.

You’ll see the recommendations in the status. Here’s what you’d look for:

kubectl describe vpa/my-app-vpa

…and you’d scroll down to see something glorious like:

Status:
  Recommendation:
    Container Recommendations:
      Container Name:  my-app
      Lower Bound:
        Cpu:     25m
        Memory:  262144k
      Target:
        Cpu:     25m
        Memory:  262144k
      Uncapped Target:
        Cpu:     15m
        Memory:  262144k
      Upper Bound:
        Cpu:     250m
        Memory:  262144k

Initial Mode: Apply Once, Then Hands Off This is the “train with wheels” mode. When you first create the VPA object or when you change the Pod template spec in your deployment (e.g., you update the container image), the VPA will swoop in at that moment and inject its recommended resources into the new pods. It only acts during the initial creation/update.

This is fantastic because it automates the resource setting without causing any surprise restarts. The pod gets the right resources from the get-go, and then it runs until its next scheduled update (e.g., a new image version). It’s the best of both worlds: automated correctness without the unpredictability. This should be the default starting point for most teams dipping their toes into VPA.

Auto Mode: Full Autopilot (Brace Yourself) This is it. The big one. With Auto mode, VPA doesn’t just advise and it doesn’t just act on creation. It continuously monitors your pods. If it decides the resources need to be updated—because usage has trended up or down for a while—it will evict your running pod so the replacement can be created with the new resources.

Yes, you read that right. It will kill your perfectly healthy pod to change its CPU request. This is the part that feels absurd, but it’s a necessary evil because you can’t change the resource requests of a running pod in Kubernetes; it’s a immutable part of the spec. So the only way to “update” it is to destroy and recreate.

This is incredibly powerful and incredibly dangerous. The danger isn’t technical—VPA is usually right—but operational. An unexpected pod restart in production might be fine for a stateless web service, but it could be disastrous for a pod in the middle of a long-running batch job or holding critical state. You absolutely must use Auto mode with:

Pod Disruption Budgets (PDBs): To prevent VPA from evicting too many pods at once and taking your service down.
Extreme Care: Only use it on workloads where occasional, unscheduled restarts are acceptable.

Why You Might Think You Want Auto, But Actually Want Initial

The siren song of “full automation” is strong. But ask yourself this: how often are your resource requirements changing dramatically without a corresponding change to the application code itself? If you’re deploying new code frequently, Initial mode already ensures every new deploy gets the right resources. The continuous adjustment of Auto is often overkill for standard web services and introduces a real, albeit small, risk of a restart at an inopportune time. Use Auto for workloads with highly variable, unpredictable usage patterns that aren’t tied to deploys. For almost everything else, Initial is your wise, cautious friend.