Alright, let’s talk about making your ECS service actually scale. You didn’t set this whole thing up just to watch it sit there like a pet rock, did you? You want it to handle traffic. When the load hits, you want more tasks. When it’s quiet, you want it to scale down so you’re not paying for ghosts. This is where Auto Scaling comes in, and AWS gives you two main levers to pull: Target Tracking and Step Scaling. They’re both powerful, but one is your brilliant, intuitive friend, and the other is the meticulous, slightly pedantic friend who needs everything spelled out in triplicate.

The Zen of Target Tracking

This is the one you’ll use 95% of the time because it’s just so damn elegant. You don’t tell AWS how to scale; you tell it what you want the world to look like. You pick a key metric—like CPU utilization or memory usage—and set a target value. AWS’s Application Auto Scaling service then does the calculus, constantly adjusting the number of tasks to keep that metric as close to your target as possible.

Think of it like a cruise control system for your task count. You set it to 70 MPH (70% CPU), and it automatically presses the gas or brake (adds or removes tasks) to maintain that speed, regardless of whether you’re going up a hill (a traffic spike) or down a hill (a lull).

Here’s a CloudFormation snippet that sets up a service to track CPU at 65%. Notice we’re not defining policies; we’re defining a Target for the Application Auto Scaling service to, well, target.

MyScalingTarget:
  Type: AWS::ApplicationAutoScaling::ScalableTarget
  Properties:
    MaxCapacity: 10
    MinCapacity: 2
    ResourceId: !Sub "service/${ClusterName}/${ServiceName}"
    ScalableDimension: ecs:service:DesiredCount
    ServiceNamespace: ecs
    RoleARN: !GetAtt AutoscalingRole.Arn

MyCPUTargetPolicy:
  Type: AWS::ApplicationAutoScaling::ScalingPolicy
  Properties:
    PolicyName: CPU65TargetTracking
    PolicyType: TargetTrackingScaling
    ScalingTargetId: !Ref MyScalingTarget
    TargetTrackingScalingPolicyConfiguration:
      TargetValue: 65.0
      PredefinedMetricSpecification:
        PredefinedMetricType: ECSServiceAverageCPUUtilization
      ScaleOutCooldown: 60
      ScaleInCooldown: 120

Why the different cooldowns? This is a critical best practice. A ScaleInCooldown (when removing tasks) should always be longer than the ScaleOutCooldown. Why? Because scaling in is a destructive act. You’re killing a running task. You want to be absolutely sure the drop in load isn’t just a brief dip before you start terminating your precious workers. Giving it a longer cooldown prevents thrashing—the embarrassing situation where your service looks like it’s doing the wave, constantly scaling out and right back in.

The Pedantry of Step Scaling

Sometimes, target tracking isn’t enough. What if your key metric isn’t CPU or memory? What if you need to scale based on the number of SQS messages in a queue, or a custom metric from CloudWatch? This is where Step Scaling comes in. It’s less “cruise control” and more “if-this-then-that” programming.

You define a CloudWatch alarm and a set of step adjustments. “If metric X is above 80 for 3 minutes, add 2 tasks. If it’s above 90, add 4 tasks.” It’s incredibly flexible but also more manual. You are now the micromanager.

The biggest pitfall here? The “Magic Number” problem. You’re scaling on the alarm’s metric, but the ECS service itself has a metric called MetricIntervalLowerBound and MetricIntervalUpperBound. These bounds are evaluated relative to the alarm’s threshold. This is where everyone’s brain short-circuits.

Let’s say your alarm threshold is 1000 (for SQS messages visible). Your step adjustment might look like this:

"StepAdjustments": [
  {
    "MetricIntervalLowerBound": 0,
    "MetricIntervalUpperBound": 500,
    "ScalingAdjustment": 1
  },
  {
    "MetricIntervalLowerBound": 500,
    "MetricIntervalUpperBound": 1000,
    "ScalingAdjustment": 2
  },
  {
    "MetricIntervalLowerBound": 1000,
    "ScalingAdjustment": 4
  }
]

This means:

  • If the metric is at 1000 (the threshold) up to 1500 (1000 + 500), add 1 task.
  • If the metric is at 1500 up to 2000, add 2 tasks.
  • If the metric is at 2000 or above, add 4 tasks.

See? Pedantic. You have to do the math yourself. One best practice is to use the AWS Management Console wizard to set this up first—it visualizes these steps for you—and then replicate the JSON into your Infrastructure-as-Code templates.

The Fargate Curveball: Provisioned Metrics

Here’s the bit AWS doesn’t highlight enough in the marketing docs: the beautiful ECSServiceAverageCPUUtilization metric we used in target tracking? It’s only emitted if your service uses the AWS Fargate launch type. If you’re running on Fargate, you’re golden.

If you’re using the EC2 launch type, however, you’re out of luck. That metric doesn’t exist for you. You have to fall back to the less-granular CPUUtilization metric from the EC2 instances themselves or, more commonly, you have to use Step Scaling based on the service’s average CPU utilization, which you’d have to publish as a custom CloudWatch metric yourself. It’s a stark reminder that Fargate isn’t just about saving ops work; it provides a fundamentally richer set of primitives for automation.

So, the rule of thumb? Use Target Tracking for CPU/Memory on Fargate. Use Step Scaling for everything else, and always, always give scale-in actions a longer cooldown than scale-out. Your wallet will thank you.