8.6 Predictive Scaling: ML-Based Proactive Scaling

Right, so you’ve got your ASG set up with dynamic scaling. It works. It reacts. It’s fine. But let’s be honest, watching your scaling policies scramble to add capacity after the CPU has already spiked feels a bit like calling a plumber after your basement is already flooded. Wouldn’t it be nice if the system could just… know? Enter Predictive Scaling. This is where AWS slaps a tiny, bespoke machine learning model on your scaling group to try and predict the future. It’s the closest thing you’ll get to a crystal ball in this business, and when it works, it’s pure magic. When it doesn’t, well, it’s a great story.

The core idea is brilliantly simple: the service analyzes your ASG’s historical load data (at least 24 hours worth, but 14 days is better) and identifies recurring patterns. Does your application get hammered every weekday at 9 AM when users log in? Does it quiet down every night and on weekends? Predictive Scaling sees that. It then proactively schedules scaling actions to have the right number of instances ready just before the load hits. This isn’t reacting; it’s anticipating. The goal is to flatten the “spiky” metrics graph we all know and dread, keeping latency low and your users happy.

How It Actually Works: Forecasts and Actions

Predictive Scaling doesn’t replace your dynamic scaling policies; it works alongside them. Think of it as the strategic planner and your dynamic policies as the tactical SWAT team. Here’s the two-phase dance it performs:

The Forecast: Every day, it generates a forecast for the next 48 hours. It predicts the minimum, maximum, and expected load (measured by a metric you choose, like CPU or request count) for each minute of that period.
The Scheduled Action: It calculates the capacity needed to meet that forecasted load and creates a scaling schedule for the next 24 hours. This schedule is updated every day, or even more frequently if it detects a change in patterns.

The beautiful part is that your good old Target Tracking or Step Scaling policies are still there, active. They handle any unexpected traffic that the ML model didn’t predict. If a forecast says you need 10 instances but a sudden news event drives traffic and you actually need 15, the dynamic policies will still kick in to add the extra 5. It’s a safety net.

Setting It Up: The Code and The Catch

Enough theory. Let’s make one. You can do this via the AWS CLI or, my preferred method, CloudFormation. Here’s a CloudFormation snippet that adds predictive scaling to an existing ASG.

MyScalingPolicy:
  Type: AWS::AutoScaling::ScalingPolicy
  Properties:
    PolicyName: PredictiveScalingPolicy
    AutoScalingGroupName: !Ref MyAutoScalingGroup
    PredictiveScalingConfiguration:
      MetricSpecification:
        TargetValue: 40
        PredefinedMetricPairSpecification:
          PredefinedMetricType: ASGCPUUtilization
        PredefinedLoadMetricSpecification:
          PredefinedMetricType: ASGTotalCPUUtilization
      Mode: ForecastAndScale
      SchedulingBufferTime: 10
    PredictiveScalingMaxCapacityBehavior: SetForecastCapacityToMax
    PredictiveScalingMode: ForecastOnly

Now, let’s talk about that SchedulingBufferTime: 10. This is a classic “smart default that’s often wrong” choice by AWS. It tells the system to schedule the scaling action 10 minutes before the predicted load change. For instances that take 5-7 minutes to boot and become healthy, this is barely enough time. If your instances are slow to boot (maybe they have long user-data scripts), a 10-minute buffer means they might still be booting when the traffic hits. I almost always increase this to 15 or even 20 minutes. Test your instance boot times and set this accordingly. It’s a small detail that completely makes or breaks the feature.

The Modes: ForecastOnly vs. ForecastAndScale

This is the most important choice you’ll make. The Mode property is your safety switch.

ForecastOnly: The ML model runs and generates a forecast, but it does not actually scale your ASG. It just writes its recommendation to your CloudWatch metrics (look for a PredictedCapacity metric). This is your “try before you buy” mode. Run it for a week. Check if the predictions match reality. It’s a fantastic way to build trust in the model (or discover it’s hopelessly wrong for your chaotic workload) without any risk.
ForecastAndScale: This is the real deal. The model both forecasts and executes the scaling actions. You only switch to this once you’ve verified the forecasts in ForecastOnly mode look sane.

Never, ever just jump straight to ForecastAndScale. The ML model needs good, patterned data to work. If your traffic is completely random, it’s going to make terrible, expensive guesses.

When It’s Brilliant and When It’s a Dumpster Fire

Use Predictive Scaling when: Your workload has strong, recurring patterns. Daily cycles, weekly cycles (e.g., quiet weekends), hourly patterns (lunchtime slump). It’s absolutely killer for business applications, SaaS platforms, and anything where human behavior drives the traffic.

Avoid it like the plague when: Your traffic is completely unpredictable. If you’re dealing with random, event-driven spikes (e.g., a DDoS attack, a viral social media post that wasn’t part of your history), the model has nothing to learn from. It’ll be useless. Also, if your historical data is less than 24 hours old, it can’t even start. The model needs a baseline.

The biggest pitfall? Assuming it’s a “set it and forget it” solution. It’s not. You must monitor the PredictedCapacity metric versus the actual demand. If a fundamental change happens in your application’s usage (you launch a new feature, enter a new market), the old patterns become invalid. You might need to disable predictive scaling temporarily until it collects enough new data to retrain itself. It’s a brilliant tool, but it’s not a substitute for your own brain. You still have to be the one who knows why the traffic does what it does.