43.6 Cost Optimization: Cloud Financial Management, Expenditure Awareness, Optimizing Resources

Right, let’s talk about money. Because if you’re not paying attention to this, you’re not just building on AWS, you’re donating to it. The cloud’s biggest trick is making cost an abstract, after-the-fact concept. You spin up a monster instance for a two-hour task, forget about it, and get a bill that looks like a phone number. Cost Optimization is the pillar where we grow up, put on our big-kid pants, and start treating the cloud like the powerful, pay-as-you-go tool it is, not an infinite magic money pit.

It boils down to three simple, brutally honest truths: 1) You can’t optimize what you can’t see. 2) The cheapest resource is the one you never use. 3) The most expensive engineer is the one who ignores 1 and 2.

Expenditure and Usage Awareness (The “Oh *&$%” Moment)

Before you optimize a single byte, you need to know where the money is going. Guessing is for amateurs. AWS provides the tools; your job is to look at them.

First, enable Cost and Usage Reports (CUR). This is non-negotiable. It’s the most granular dataset you can get, dumped into an S3 bucket for you to slice, dice, and cry over. This isn’t pretty dashboard stuff—this is “give me every line item for every service in every region” data. You need this.

Next, set up Cost Allocation Tags. You must tag your resources. Name is not enough. You need environment=prod, project=phoenix, team=data-engineering. Without tags, your bill is a mysterious black hole. With them, you can tell exactly which team’s pet project is responsible for that $5,000/month ElastiCache bill.

Here’s a quick Terraform snippet to ensure you’re tagging everything, because discipline starts at deployment. This enforces tags on all resources created by this particular Terraform config.

# Enforce tagging in your Terraform configuration
terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"

  default_tags {
    tags = {
      Environment   = "dev"
      Project       = "cost-optimization-demo"
      Terraform     = "true"
      CostCenter    = "12345"
      # You get the idea. Add your own.
    }
  }
}

Now, the pièce de résistance: Budgets. This is your alarm system. Setting a budget without alerts is like buying a smoke detector and taking out the batteries. Make it scream at you via email and SNS when you hit 80%, 100%, and 200% of your expected spend.

# Create an SNS topic for budget alarms first.
# Replace with your email and preferred region.
aws sns create-topic --name budget-alarms
aws sns subscribe --topic-arn arn:aws:sns:us-east-1:123456789012:budget-alarms --protocol email --notification-endpoint your-email@company.com

# Then create a budget using the AWS CLI (JSON config is easier)
# Save this as `budget-config.json`
{
  "Budget": {
    "BudgetName": "monthly-ec2-budget",
    "BudgetLimit": {
      "Amount": "1000",
      "Unit": "USD"
    },
    "CostFilters": {
      "Service": "Amazon Elastic Compute Cloud - Compute"
    },
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST",
    "CostTypes": {
      "IncludeTax": true,
      "IncludeSubscription": true,
      "UseBlended": false
    }
  },
  "Notifications": [
    {
      "Notification": {
        "ComparisonOperator": "GREATER_THAN",
        "NotificationType": "ACTUAL",
        "Threshold": 80,
        "ThresholdType": "PERCENTAGE"
      },
      "Subscribers": [
        {
          "Address": "arn:aws:sns:us-east-1:123456789012:budget-alarms",
          "SubscriptionType": "SNS"
        }
      ]
    }
  ]
}

# Create the budget
aws budgets create-budget --account-id 123456789012 --budget file://budget-config.json

Optimizing Compute: The Low-Hanging Fruit

Compute is usually your biggest bill, and right-sizing is the easiest win. AWS would happily let you run a m5.24xlarge for your company’s static WordPress blog. Don’t be that person.

Start with Trusted Advisor and Cost Explorer’s Rightsizing Recommendations. They’ll literally tell you “this instance is underutilized, downsize it to this cheaper type.” It’s free advice. Take it.

Embrace Spot Instances for anything stateless, batch-oriented, or fault-tolerant. The discounts are insane (often 60-90%) because you’re using AWS’s spare capacity they can yank away with a two-minute warning. For a data processing job or a CI/CD worker? A no-brainer. The trick is to use a diversified allocation strategy across instance types and pools to minimize interruptions.

// Example Spot Fleet request configuration for an ASG (Launch Template)
// The key is to be flexible. Don't ask for one instance type; ask for many.
{
  "SpotPrice": "0.10",
  "TargetCapacity": 10,
  "IamFleetRole": "arn:aws:iam::123456789012:role/aws-ec2-spot-fleet-tagging-role",
  "LaunchSpecifications": [
    {
      "InstanceType": "m5.large",
      "SubnetId": "subnet-123456",
      "WeightedCapacity": 1
    },
    {
      "InstanceType": "m4.large",
      "SubnetId": "subnet-234567",
      "WeightedCapacity": 1
    },
    {
      "InstanceType": "m5a.large",
      "SubnetId": "subnet-345678",
      "WeightedCapacity": 1
    }
  ],
  "AllocationStrategy": "lowestPrice"
}

Schedule everything. Do your development environments really need to run at 3 AM on a Sunday? Of course not. Use AWS Instance Scheduler or simple Lambda functions triggered by CloudWatch Events to turn non-production resources off when no one’s using them. This isn’t just optimization; it’s basic hygiene.

Storage: Where Data Goes to Get Expensive

S3 is cheap until it very much isn’t. Lifecycle Policies are your best friend. Move data that hasn’t been accessed in 30 days to Standard-IA. After 90 days, ship it off to Glacier Deep Archive for the long-term stuff—like logs you’re legally required to keep but will hopefully never need. The cost difference is staggering.

# A CloudFormation snippet for an S3 bucket lifecycle configuration
LifecycleConfiguration:
  Rules:
    - Id: MoveToInfrequentAccess
      Status: Enabled
      Transitions:
        - TransitionInDays: 30
          StorageClass: STANDARD_IA
    - Id: ArchiveToGlacier
      Status: Enabled
      Transitions:
        - TransitionInDays: 90
          StorageClass: GLACIER

And for the love of all that is holy, delete data you don’t need. I’ve seen petabytes of obsolete AMIs, old EBS snapshots, and abandoned S3 buckets costing thousands a month. Set up rules to automatically delete these. It’s digital janitorial work, and it’s critically important.

The Commitment: Savings Plans

This is where AWS rewards you for acting like a sane adult and making a commitment. Savings Plans are the modern, flexible version of Reserved Instances. You commit to a certain amount of compute usage (e.g., $10/hour) for a 1 or 3-year term, and you get a massive discount (up to 72%) on that usage, across EC2, Fargate, and Lambda.

The key insight here is that it’s flexible. Unlike the old RIs which were tied to specific instance types and families, a Savings Plan gives you that discount across a wide range of compute, as long as you stay within the same family (e.g., Compute Savings Plan). Buy this after you’ve right-sized your environment and know your steady-state baseline. It’s the final step, not the first.

The goal isn’t to be cheap. It’s to be smart. It’s about spending money on the things that matter—innovation, performance, resilience—instead of wasting it on things that don’t, like an over-provisioned, forgotten-about instance humming away in us-west-2,

doing absolutely nothing but padding AWS’s quarterly earnings.