8.5 Scheduled Scaling: Predictable Load Patterns
Right, so you’ve got your Auto Scaling Group (ASG) humming along, dynamically adding and removing instances based on the whims of CPU usage or network traffic. It’s a beautiful thing for unpredictable load. But let’s be honest: a lot of our scaling problems aren’t mysterious. They’re painfully, predictably boring. You know your batch jobs kick off at 2 AM. You know the marketing email blast goes out every Tuesday at 9 AM. You know your e-commerce site turns into a digital ghost town after midnight. For these events, using a reactive policy is like using a sledgehammer to crack a nut—it works, but it’s overkill and you’ll probably damage the drywall.
This is where Scheduled Scaling comes in. It’s the ASG’s calendar feature. Instead of waiting for an alarm to go off and then scrambling to add capacity, you simply tell your ASG: “Hey, on this date, at this time, have exactly this many instances ready to go.” It’s the difference between being proactive and reactive. The beauty is in its brutal simplicity.
How Scheduled Scaling Actually Works
Don’t overthink it. Under the hood, Scheduled Scaling is just the ASG service creating a one-time or recurring scaling action for you. You define a schedule (a cron expression or a rate expression) and the desired minimum, maximum, and desired capacity for that time window. When the time arrives, the ASG evaluates the action and adjusts its settings accordingly. It’s not magic; it’s a time-based API call executed on your behalf.
The key thing to remember is that these actions override the settings of any dynamic scaling policies you have for the duration of the action. If your dynamic policy says min=2, max=10, but your scheduled action says min=5, max=5 for the next hour, then you will have exactly five instances for that hour, regardless of CPU. The dynamic policies are temporarily neutered. This is a common “gotcha”—people forget the scheduled action is in charge and panic when their CPU-based scaling seems broken.
Crafting the Perfect Cron Expression
This is where most of us mess up. AWS uses its own slightly… divergent version of cron. It adds a required field for the year. Yes, the year. Because obviously, I need to plan my instance count for 2045 right now.
The format is: Minutes Hours Day-of-month Month Day-of-week Year
You can use a wildcard (*) for any field you don’t care about, but you must include all six fields. For a recurring event, you’ll just wildcard the year. For example, to scale up every weekday at 9 AM UTC, you’d use:
0 9 ? * MON-FRI *
And to run a batch job at 2:30 AM every day, you’d use:
30 2 * * ? *
Notice the ? for the day-of-month. You use ? when you’re specifying a day-of-week instead, because having both defined would be logically insane and the system rightly refuses to try and figure out what you meant.
A Real-World Example: The Morning Batch
Let’s say you have a data processing batch that runs every morning. You need extra capacity from 5 AM to 7 AM to handle it. Here’s how you’d create that scheduled action using the AWS CLI. Notice how we’re setting MinSize, MaxSize, and DesiredCapacity to the same value. This is a common pattern for scheduled scaling—it pins the capacity to an exact number, preventing any dynamic funny business during a critical period.
aws autoscaling put-scheduled-update-group-action \
--auto-scaling-group-name MyWebServerASG \
--scheduled-action-name "Morning-Batch-Scale-Up" \
--start-time "2024-01-01T05:00:00Z" \
--recurrence "0 5 * * ?" \
--min-size 5 \
--max-size 5 \
--desired-capacity 5
And of course, you need to remember to scale back down. You don’t want to pay for those extra instances all day.
aws autoscaling put-scheduled-update-group-action \
--auto-scaling-group-name MyWebServerASG \
--scheduled-action-name "Morning-Batch-Scale-Down" \
--start-time "2024-01-01T07:00:00Z" \
--recurrence "0 7 * * ?" \
--min-size 2 \
--max-size 10 \
--desired-capacity 2
The Pitfalls and How to Avoid Them
- Time Zones Are a Trap: All times in AWS are in UTC. I don’t care what your watch says. If you set an action for 9 AM and you’re in EST, you’ve just scheduled it for 4 AM your time. You will mess this up. I have messed this up. Always, always double-check your UTC conversions.
- Overlap Chaos: What happens if you have two scheduled actions that overlap? Maybe a “weekday” action and a “holiday” action? The rule is: the action with the latest start time wins. This is predictable but can lead to unexpected results if you’re not meticulously managing your schedules.
- Forgot to Scale Back Down: This is the classic, wallet-emptying mistake. You scale up for a holiday sale and forget to remove the action or schedule the scale-down. Set calendar reminders for yourself to review and clean up old scaling actions. Better yet, use Infrastructure-as-Code (like Terraform or CloudFormation) so your schedules are defined in version-controlled code, not clicked into existence in the console and immediately forgotten.
- Ignoring Instance Health: A scheduled action will set the capacity, but it doesn’t care if the instances it asks for are actually healthy. If you’re scaling to 10 instances but have a broken AMI, the ASG will just keep trying (and failing) to launch them. Your dynamic scaling policies are paused, so your alarms won’t fire. You still need to monitor the group’s activity during these scheduled periods.
Scheduled Scaling is a brilliantly simple tool. Use it for what it’s good for: the predictable, boring, and absolutely critical load patterns. For everything else, let your dynamic policies handle the surprises. Just watch your cron syntax and for heaven’s sake, remember the time zone. Your CFO will thank you.