11.7 Combining HPA and VPA: Caveats and Best Practices

Right, so you’ve decided you want both horizontal and vertical autoscaling. Ambitious. A little greedy, even. I like it. It’s the “have your cake and eat it too” of Kubernetes resource management. But let’s be absolutely clear: combining HPA and VPA is like putting two brilliant, highly opinionated chefs in the same kitchen. If you don’t set very strict rules, they will absolutely fight over the stove, and you’ll end up with a culinary disaster (read: a cascading pod eviction nightmare).

The core conflict is simple: HPA scales based on current resource usage, adding or removing replicas. VPA scales by adjusting the resource requests and limits of your pods, which directly changes their observed usage. If VPA increases the CPU request of your pods, the same actual CPU consumption will now show as a lower utilization percentage. This can cause HPA to see lower load and scale down, even though nothing about the actual traffic has changed. The reverse is also true. They’re stepping on each other’s toes.

The Golden Rule: Never Let Them Touch the Same Resource

This is non-negotiable. You must configure HPA and VPA to act on different resources. Full stop. The most common and sane pattern is:

HPA scales on CPU and/or Memory Utilization.
VPA scales on other metrics, or provides recommendations for CPU/Memory that you apply manually.

Trying to have them both automatically manage CPU is a recipe for a feedback loop that will make your cluster yo-yo harder than a kid on a sugar rush. Let’s codify this rule.

# HPA for CPU - This is fine.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
---
# VPA for Memory - This is also fine.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto" # Or "Off" / "Initial" - more on this soon.
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi
      controlledResources: ["memory"] # The Key! VPA only controls memory.

See that controlledResources field? That’s your mediator. Here, VPA is forbidden from touching CPU, so it can’t interfere with the HPA’s decision-making process. It’s only allowed to manage memory, which HPA isn’t even looking at in this configuration. Peaceful coexistence.

Choosing Your VPA Update Mode Wisely

VPA’s updateMode is your throttle. Using Auto mode means VPA will just evict your pods whenever it feels like it to apply new resource values. This is… bold. In production, it’s often terrifying. You’re essentially giving a robot the keys to your pod lifecycle.

updateMode: Off: My personal favorite for combined setups. VPA becomes a brilliant recommendation engine, not an executor. It will sit there, watch your pods, and tell you what it would do. You can then review those recommendations and apply them manually during a deployment window. It’s the safest option.
updateMode: Initial: This is VPA on training wheels. It only sets the resource requests when the pod is first created. It won’t touch already running pods. This is great for getting the right initial sizing but avoids the chaos of live updates.
updateMode: Auto: Use this if you enjoy living on the edge and have absolute faith in your VPA configuration and pod disruption budgets. For most of us, it’s a staging environment curiosity.

Pod Disruption Budgets: Your Safety Net

When VPA in Auto mode decides to update a pod, it does so by evicting it. This is not a graceful rolling update. If you have 10 replicas and VPA decides they all need new memory limits, it will try to evict all 10 at once, which will promptly take your service offline. You must define a PodDisruptionBudget (PDB) to limit the number of concurrent evictions.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  maxUnavailable: 1 # Only allow one pod to be evicted at a time.
  selector:
    matchLabels:
      app: my-app

This PDB tells Kubernetes, “Hey, no matter what VPA asks for, never let more than one pod for this app be unavailable at a time.” This forces the evictions to happen serially, allowing your HPA (or your deployment strategy) to maintain service capacity. It’s not a nice-to-have; it’s a requirement for any Auto mode usage.

The combined HPA/VPA approach is powerful, but it’s advanced circuitry. It demands respect, careful configuration, and a healthy distrust of fully automated systems. Start with Off or Initial mode, use controlledResources to silo their responsibilities, and protect yourself with PDBs. Do that, and you’ll have that cake and get to eat it, too, without making a mess of the kitchen.