42.8 AWS Cost and Usage Report (CUR): Granular Billing Data in S3

Right, let’s talk about the AWS Cost and Usage Report, or CUR. This isn’t the friendly, slightly dumbed-down dashboard of Cost Explorer. This is the raw, unfiltered firehose of data. If Cost Explorer is a carefully curated cocktail, the CUR is the entire distillery dumped into your lap. You get every last line item, every resource ID, every tag (or lack thereof), delivered as a gargantuan CSV or Parquet file dumped into an S3 bucket of your choice. It’s the ultimate source of truth for your AWS spend, and if you’re serious about cost optimization, you will learn to be friends with it.

Why You Need the CUR (Yes, Really)

You might think, “My monthly bill and Cost Explorer are enough.” Bless your heart. They’re not. The monthly bill is a summary, and Cost Explorer, while useful, has limits on how far you can drill down and how much data you can export. The CUR gives you the atomic units of your bill. Want to know exactly which EC2 instance (by its i-1234567890abcdef0 ID) in a specific account ran for 14.7 hours and cost you $3.42 because someone forgot to apply a shutdown:never tag? The CUR is your only way to find that needle in the haystack. It’s essential for showback/chargeback, custom reporting, forensic cost investigation, and building your own, more intelligent, cost dashboards.

Setting Up the Firehose: Creating a CUR Definition

Setting this up feels a bit like configuring a mainframe, but stick with me. You do this via the AWS Billing console. The key is to be meticulous. A misconfigured CUR is worse than no CUR at all—it’ll just drown you in useless files.

First, you must enable the IAM role and policy that lets AWS write to your chosen S3 bucket. AWS provides a policy for this. Do not just blindly copy-paste it. Look at the Resource section. The example policy uses a wildcard (*) for the bucket name. You are a professional. Replace that wildcard with the exact ARN of your dedicated billing bucket. Security first, people.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketAcl",
                "s3:GetBucketPolicy",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::your-dedicated-billing-bucket",
                "arn:aws:s3:::your-dedicated-billing-bucket/*"
            ]
        }
    ]
}

Now, create the report definition itself. Choose “Daily” for the report frequency and “GZIP” for compression—your wallet will thank you for the reduced S3 storage costs. The critical choice is the versioning. Always. Select. Overwrite existing report. The other option, Create new report version, sounds logical but is a trap. It will create a new set of files every day, leading to a nightmarish duplication of data that will make querying it an absolute horror show. Trust me, you want one file per day, overwritten the next day with that day’s full dataset.

The Anatomy of the Beast: What’s Actually In the File?

Crack open a CUR file (I recommend using AWS Athena for this, as we’ll discuss) and prepare for sensory overload. There are hundreds of columns. The important ones become your best friends:

identity_line_item_id: The unique identifier for each charge.
line_item_usage_account_id: The account that accrued the cost. Crucial for multi-account setups.
line_item_usage_start/end_date: When the resource was used.
product_region: Because ‘US East (N. Virginia)’ is too much to type, apparently.
resource_id: The actual ID of the thing costing you money (e.g., vol-0abcd1234efgh5678).
line_item_unblended_cost: The actual cost. This is the number you care about.
And the big one: tags. Every tag applied to the resource is its own column (user:application, environment:production, etc.). This is why you tag your resources, remember?

Querying Without Losing Your Mind: Athena is Your New Best Friend

Trying to open these multi-gigabyte CSVs in Excel is a one-way ticket to watching your laptop burst into flames. The sane way to work with the CUR is by using Amazon Athena. It’s a serverless query engine that lets you run SQL on data in S3. You point it at the CUR’s manifest file, define a schema, and suddenly you have superpowers.

Here’s a basic Athena setup to get you started. First, create the database and table. The magic is using the AWS Glue Data Catalog to automatically discover the schema of your CUR—because no one is manually defining 200+ columns.

-- Create a database to keep things tidy
CREATE DATABASE aws_cost_data;

-- Now create the table using the Glue Crawler. This is the smart way.
-- The MANIFEST file is the key - it tells Athena where all the data files are.
-- Replace 'your-dedicated-billing-bucket' and 'your-report-prefix' with your details.

CREATE EXTERNAL TABLE aws_cost_data.cur_table (
  ...all_those_columns...
)
PARTITIONED BY (year string, month string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://your-dedicated-billing-bucket/your-report-prefix/'
TBLPROPERTIES (
  'projection.enabled' = 'true',
  'projection.year.type' = 'integer',
  'projection.year.range' = '2023,2027',
  'projection.month.type' = 'integer',
  'projection.month.range' = '01,12',
  'storage.location.template' = 's3://your-dedicated-billing-bucket/your-report-prefix/year=${year}/month=${month}'
);

Now for the fun part: asking questions. Forget wading through menus. Just write SQL.

-- Find your top 10 most expensive EC2 instances last month, including their tags
SELECT
  line_item_usage_account_id,
  resource_id,
  product_region,
  line_item_unblended_cost,
  tags['user:application'] as application,
  tags['environment'] as environment
FROM aws_cost_data.cur_table
WHERE product_product_name = 'Amazon Elastic Compute Cloud'
  AND line_item_unblended_cost > 0
  AND year = '2023'
  AND month = '10'
ORDER BY line_item_unblended_cost DESC
LIMIT 10;

The Gotchas: Where AWS Makes It Unnecessarily Annoying

The Lag: The CUR is not real-time. It updates several times a day, but data can be 24-48 hours behind. Don’t panic if you don’t see this morning’s Lambda invocation in there yet.
Tag Propagation Lag: This is the big one. You tag an EC2 instance, but the cost data from before the tag was applied might not get retroactively updated. The tag has to propagate through AWS’s billing systems, which can take up to 48 hours. So your reports might be messy for a day or two after a tagging spree.
The Schema Can Change: AWS occasionally adds new columns or services. If you’ve hardcoded a schema in a script, it might break. Using Athena with Glue helps mitigate this, but it’s something to be aware of.

Master the CUR, and you stop guessing about your AWS bill. You know. And that, my friend, is the foundation of true cost control.