Right, let’s talk about CloudTrail. This is the service that saves your bacon. It’s the security camera in the hallway of your AWS account, meticulously recording who came in, what door they used, and what they tried to do. Every API call—every single one—made by a user, role, or service gets logged here. If you ever need to answer the questions “What happened?” or “Who did it?”, this is your first and last stop.

Now, before you get too excited, let’s be clear: CloudTrail is brilliant, but it’s also a firehose of JSON. It logs everything, including the ten thousand pointless calls that some badly configured script made at 3 AM. Your job is to figure out how to drink from that firehose without drowning.

The Two Flavors of CloudTrail: Event History and Trails

First, know that you’re dealing with two distinct things here, and AWS does a terrible job of making this obvious upfront.

The Event History is the free, read-only, last 90-day view. It’s like the basic DVR that came with your security system. You can’t configure it, you can’t send its logs anywhere, but you can quickly search it for “Hey, what did that IAM user do yesterday?” It’s useful for ad-hoc troubleshooting, but it’s not for serious audit work.

Then you have Trails. This is the paid, fully-featured, “I need to keep this data forever for compliance” feature. A Trail is a configuration you create that says, “Take that firehose of API events and do something useful with it.” The two most useful things are: 1) dump it into an S3 bucket for long-term storage, and 2) ship it to CloudWatch Logs for real-time alerting. You should absolutely have at least one Trail running that logs all regions and sends everything to an S3 bucket. Let’s set that up properly.

# Create an S3 bucket for your logs (make the name unique)
aws s3api create-bucket --bucket my-cloudtrail-logs-unique-name --region us-east-1

# Create a basic policy for the bucket. Save this as 'bucket-policy.json'
# This policy lets CloudTrail write to the bucket and prefix. It's the bare minimum.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {"Service": "cloudtrail.amazonaws.com"},
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::my-cloudtrail-logs-unique-name/AWSLogs/123456789012/*"
        }
    ]
}

# Apply the policy
aws s3api put-bucket-policy --bucket my-cloudtrail-logs-unique-name --policy file://bucket-policy.json

# Now create the trail itself. This is the crucial part.
aws cloudtrail create-trail \
    --name My-All-Regions-Trail \
    --s3-bucket-name my-cloudtrail-logs-unique-name \
    --is-multi-region-trail \
    --no-include-global-service-events

# Finally, start logging
aws cloudtrail start-logging --name My-All-Regions-Trail

The Anatomy of a CloudTrail Event

A CloudTrail event is a JSON blob. A big one. When you open one of these files in S3, you’ll see an array of Records. Each record has a standard schema. The parts you’ll care about most are:

  • eventTime: When it happened. Duh.
  • eventSource: Which AWS service was called (e.g., s3.amazonaws.com).
  • eventName: The specific API action (e.g., CreateBucket, DeleteObject).
  • userIdentity: This tells you who did it. Was it a root user? An IAM user? An assumed role? This is your culprit.
  • requestParameters & responseElements: The gold mine. This is the detailed input to and output from the API call. Want to know which S3 object was deleted? It’s in requestParameters. Want to know the ARN of the brand-new Lambda function they created? It’s in responseElements.

The Gotchas and Best Practices

Here’s where my “been in the trenches” advice earns its keep.

  1. The 15-Minute Lag: CloudTrail is not real-time. It can take up to 15 minutes for events to be delivered to your S3 bucket after the API call is made. Don’t panic when you don’t see your test event immediately. For near-real-time, you must use the CloudWatch Logs integration.

  2. Data Events are OFF by Default: This is the biggest “oh crap” moment for people. Management Trails (the default) only log control plane operations—things like creating an S3 bucket or an EC2 instance. They do not log data plane operations—like reading or writing an object in that S3 bucket (GetObject, PutObject). If you need to know who accessed what data, you must explicitly enable Data Events (formerly called “S3 Bucket Logging” within CloudTrail). Be warned: this gets expensive very quickly on high-traffic buckets.

    # Enable data events for a specific, sensitive S3 bucket
    aws cloudtrail put-event-selectors \
        --trail-name My-All-Regions-Trail \
        --event-selectors '[{
            "ReadWriteType": "All",
            "IncludeManagementEvents": true,
            "DataResources": [{
                "Type": "AWS::S3::Object",
                "Values": ["arn:aws:s3:::my-sensitive-bucket/"]
            }]
        }]'
    
  3. Lock Down Your Logs S3 Bucket: The single worst thing you can do is let an attacker delete your evidence of their attack. Enable S3 Object Lock, use a strict bucket policy, and absolutely forbid any IAM principal from deleting logs. Treat this bucket as the crown jewels, because it is.

  4. Use Athena to Query the Logs: You are not meant to grep through terabytes of JSON files in S3. It’s a fool’s errand. Use Amazon Athena to run SQL queries directly against the data. CloudTrail even provides a pre-built table definition for you. This is the only sane way to find a needle in this particular haystack.

CloudTrail is the foundation of security and accountability in AWS. It’s verbose, occasionally slow, and a bit clunky, but it is utterly indispensable. Set it up correctly once, lock it down, and then forget about it until the day you desperately need it. And trust me, that day will come.