42.5 AWS Compute Optimizer: Right-Sizing EC2, Lambda, and ECS Fargate

Right, let’s talk about AWS Compute Optimizer. You’re probably here because you’ve seen a bill that made you wince and thought, “Surely I’m not using all of this?” You’re likely correct. Most of us aren’t. We over-provision “just to be safe,” which is the cloud equivalent of buying a monster truck for your daily commute to the grocery store. It works, but your wallet is crying. Compute Optimizer is the pragmatic friend who looks at your parking garage and says, “You know, a sedan would do.”

Here’s the core idea: AWS, being the all-seeing panopticon that it is, collects low-level performance metrics on your compute usage—CPU, memory, network, and (for EBS) disk I/O. Compute Optimizer analyzes this data, applies some machine learning voodoo to find patterns, and then compares your current资源配置 to all the other instance types available. It tells you if you’re over-provisioned (wasting money), under-provisioned (risking performance), or—and this is the real miracle—if you’re just right. Spoiler alert: you’re probably not just right.

How It Actually Works (The Magic Isn’t Free)

First, a crucial reality check: it needs data. A solid week’s worth, minimum, and for a reliable recommendation, you really want a full 14 days of consistent utilization history. It can’t recommend what it can’t see. If you spun up an instance yesterday for a one-off job and terminated it, Compute Optimizer won’t have anything useful to say. It’s not clairvoyant; it’s a historian with a very specific focus.

It also needs the right permissions. You need to grant it read-only access to your CloudWatch metrics. If you’re using IAM, which you absolutely should be, this means attaching a policy to a role or user. AWS will try to gently nudge you to do this through the console, but here’s the programmatic way to avoid the clicky-clicky nonsense:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "compute-optimizer:GetEnrollmentStatus",
                "compute-optimizer:GetEC2InstanceRecommendations",
                "compute-optimizer:GetLambdaFunctionRecommendations",
                "compute-optimizer:GetAutoScalingGroupRecommendations",
                "compute-optimizer:GetEBSVolumeRecommendations"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:GetMetricData",
                "ce:GetCostAndUsage"
            ],
            "Resource": "*"
        }
    ]
}

Reading the Recommendations (Beyond the Green Checkmark)

Don’t just blindly apply the first recommendation you see. The console will show you a summary, but the gold is in the details. Click into a recommendation. You’ll see things like:

Current vs. Recommended Instance Type: e.g., m5.2xlarge -> m5.xlarge
Projected CPU Utilization: This is key. It might say your current instance runs at 12% average CPU but the recommended one would run at 45%. That’s a good, healthy jump without hitting the danger zone.
Performance Risk: This is Compute Optimizer’s confidence meter. “Very Low” means it’s almost certain you won’t see a performance hit. “Medium” or higher is its way of saying, “Look, the data is a bit noisy, maybe test this first during a low-traffic period.”

The most important metric, frankly, is the estimated monthly savings. This is what makes the CFO’s eyes light up. It’s calculated using the public on-demand price. If you have Savings Plans or Reserved Instances, the actual savings might be different, but it’s a fantastic benchmark.

The Gotchas and Grey Areas (Where the Magic Fades)

This tool is brilliant, but it’s not omniscient. Here’s where you need to apply your own brain.

The Bursty Workload Trap: Is your application idle 23 hours a day and then spikes to 100% CPU for 10 minutes? Compute Optimizer’s average utilization will look terrible. It might recommend a downsized instance that would utterly choke during that burst period. You have to know your own workload patterns. This is where its % recommendations are a starting point for your investigation, not a command.
Memory is the Silent Killer: CPU is easy to measure. Memory pressure is trickier. An application might use very little average memory but have a huge working set that it needs right now. If you downsize based on average memory and your app starts swapping, performance falls off a cliff. Compute Optimizer can recommend based on memory, but you must be cautious.
It Doesn’t Know Your Roadmap: The tool analyzes the past. It has no idea you’re planning to launch a massive new feature next month that will double your traffic. You have to factor that in.

Putting It Into Practice: An API Example

Sure, the console is pretty, but you’re not going to check it manually every day. Let’s automate this. The following Python snippet (using Boto3) fetches your EC2 recommendations and filters for only those with high confidence and savings over $50 a month. This is the kind of thing you’d run in a Lambda function to send to a Slack channel.

import boto3
from datetime import datetime, timedelta

client = boto3.client('compute-optimizer')

def lambda_handler(event, context):
    # Fetch EC2 recommendations
    response = client.get_ec2_instance_recommendations()
    
    worthwhile_recommendations = []
    
    for recommendation in response['instanceRecommendations']:
        instance_arn = recommendation['instanceArn']
        current_type = recommendation['currentInstanceType']
        recommended_type = recommendation['recommendationOptions'][0]['instanceType']
        
        # Check performance risk is low and savings are significant
        risk = recommendation['recommendationOptions'][0]['performanceRisk']
        monthly_savings = recommendation['recommendationOptions'][0]['estimatedMonthlySavings']['value']
        
        if risk <= 1.0 and monthly_savings > 50:  # Low risk & saves > $50/month
            worthwhile_recommendations.append({
                'Instance': instance_arn.split('/')[-1],
                'Current': current_type,
                'Recommended': recommended_type,
                'MonthlySavings': monthly_savings
            })
    
    # Now do something with this list! Send to SNS, Slack, etc.
    if worthwhile_recommendations:
        print("Found worthwhile downsizing opportunities:")
        for rec in worthwhile_recommendations:
            print(f"Instance {rec['Instance']}: {rec['Current']} -> {rec['Recommended']} (Saves ${rec['MonthlySavings']:.2f}/month)")
    else:
        print("No strong recommendations found right now.")

Beyond EC2: Lambda and Fargate

Yes, it does these too, and it’s just as useful.

For Lambda, it looks at your function’s memory configuration and its actual utilization. It’s famously common to see functions configured with 1024MB of RAM that only ever use 150MB. Compute Optimizer will point this out and recommend a lower memory setting, which also directly lowers your compute cost (since Lambda cost is a product of allocated memory and execution time). This is often the lowest-hanging fruit in your entire AWS account.

For ECS Fargate, the principle is identical to EC2: it analyzes the CPU and memory utilization of your tasks and recommends a more appropriately sized CPU/memory combination. The same pitfalls apply—beware of bursty tasks.

The bottom line? Use Compute Optimizer as your highly informed, data-driven first opinion. But you are still the doctor who knows the patient’s full history. Trust its diagnosis, but always, always verify with a stress test before you commit to the new prescription.