25.2 Origins: S3, ALB, EC2, API Gateway, and Custom Origins

Right, so you’ve told CloudFront where to send your users (the distribution), and how to handle their requests (behaviors). Now we get to the heart of the matter: the Origin. This is the actual, honest-to-goodness source of your content. It’s the server CloudFront goes to, hat in hand, when its own cache is empty. Think of it as CloudFront’s supplier. And just like in the real world, your choice of supplier dictates everything about quality, price, and how much of a headache you’re in for.

The most important thing to internalize is this: CloudFront doesn’t store your stuff. It’s a caching layer. Its entire job is to be a massively distributed, incredibly fast intermediary between your users and your origin. Get the origin configuration wrong, and you’ve built a spectacularly expensive and complex bottleneck.

S3: The Usual Suspect

This is the classic, bread-and-butter origin. You have static assets—images, CSS, JS, videos—and you dump them in an S3 bucket. CloudFront fronts it. Simple, right? Well, mostly.

First, the huge “gotcha” you must avoid: permissions. An S3 bucket is private by default. If you just create a bucket and point CloudFront at it, you’ll get vicious 403 Access Denied errors. You have two sane choices to fix this:

Use an Origin Access Control (OAC): This is the modern, recommended way. You create an OAC in CloudFront, attach it to your origin, and then give the OAC’s service principal permission in the S3 bucket policy to s3:GetObject. CloudFront authenticates itself with the bucket using this identity. It’s secure and clean.

// Example S3 Bucket Policy for OAC (replace BUCKET_NAME and OAC ID)
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "cloudfront.amazonaws.com"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::BUCKET_NAME/*",
            "Condition": {
                "StringEquals": {
                    "AWS:SourceArn": "arn:aws:cloudfront::ACCOUNT_ID:distribution/DISTRIBUTION_ID"
                }
            }
        }
    ]
}

Use an Origin Access Identity (OAI): The older, now-legacy method. It works similarly but is less flexible. Prefer OAC for new distributions.

Why use S3? It’s dirt-cheap for storage, scales infinitely without a thought, and is brutally reliable. It’s the perfect origin for anything that doesn’t need to be generated on the fly.

Load Balancers (ALB & NLB) and EC2: The Dynamic Duo (and their messy cousin)

When your content is dynamic, you point CloudFront at a load balancer (Application or Network Load Balancer), which then routes to your EC2 instances, ECS tasks, or whatever compute you’re running.

This is where the magic happens: you get the global low-latency cache for your static assets and the optimized, keep-alive connection from the CloudFront edge to your AWS region for your dynamic content. It’s a performance win-win.

The critical pitfall here is security groups. Your ALB’s security group must allow inbound traffic from… well, from CloudFront. But CloudFront’s IP ranges are vast and change. You do not want to manage that manually. The correct, robust solution is to configure your security group to allow inbound HTTP/S traffic from the security group of the ALB itself. Since the ALB is the target, and CloudFront connects to the ALB, the ALB sees the source IP as the CloudFront edge location. The security group rule at your ALB should be permissive for the world (or just for CloudFront’s published IPs if you’re paranoid). The security group on your backend instances, however, should only allow traffic from the security group of the ALB. This is fundamental AWS networking hygiene.

Pointing directly to an EC2 instance (a “custom origin”) is… usually a bad idea. You lose the scalability, health checks, and SSL termination of a load balancer. Just use an ALB. Seriously.

API Gateway: The Overthinker’s Origin

Pointing a CloudFront distribution at an API Gateway REST or HTTP API is a fantastic pattern. Why? Because sometimes you want the authentication, throttling, and transformation logic of API Gateway, but you also want the global reach and caching of CloudFront. You can, for instance, cache the responses from your API endpoints right at the edge, dramatically reducing latency and load on your backend.

The “questionable choice” you need to call out: the default Host header. When CloudFront forwards a request to API Gateway, it sets the Host header to the domain name of the API Gateway stage (e.g., api-id.execute-api.us-east-1.amazonaws.com). This is fine. But if you’ve configured a custom domain name for your API Gateway, you might want the Host header to match that instead for any logic on your end. You can use a CloudFront Origin Custom Header to override it:

# Use the AWS CLI to update the origin and set a custom header
aws cloudfront update-distribution \
    --id E123456789ABCDE \
    --distribution-config file://dist-config.json

And in your dist-config.json file, you’d ensure your origin config has:

"CustomHeaders": [
    {
        "HeaderName": "Host",
        "HeaderValue": "my-custom-api.example.com"
    }
]

Custom Origins: The “Get Out of Jail Free” Card

This is your escape hatch. Any HTTP server on the internet that isn’t an S3 bucket or another AWS service integrated specifically? That’s a custom origin. This could be your own server in a colocation facility, a DigitalOcean droplet, or even a legacy API running in your company’s data center.

The best practice here is to use a custom origin for any non-AWS endpoint or for an AWS service that doesn’t have a dedicated origin type. You get the same fine-grained control over connection timeouts, keep-alive settings, and SSL protocols. The major benefit is that you still get the optimized network path from the CloudFront edge to your origin—AWS’s backbone is likely better than the public internet, even if your server is elsewhere.