25.3 Cache Behaviors: Path Patterns, Cache Policies, and Origin Request Policies
Right, let’s talk about the part of CloudFront where you actually get to think: Cache Behaviors. This is where you move from just slapping a CDN in front of your stuff to actually architecting how it behaves. It’s the difference between a bouncer who just checks IDs and one who knows the regulars, the VIPs, and the troublemakers who need a different door.
The core idea is simple but powerful: you can tell CloudFront to handle different types of requests differently based on the path pattern. A request for /api/graphql should probably behave very differently than one for /images/cat_picture_1024.jpg. Behaviors let you do that. You create a list of these behaviors, ordered from most specific to least specific (that default * catch-all we talked about last time), and CloudFront walks down that list until it finds a match.
Path Patterns: Your First Line of Defense
Think of path patterns as your rule-setting filter. They’re not full regex, thank goodness, because we’re not animals. They’re simpler, using * as a wildcard which matches any 0 or more characters. So /images/* matches /images/logo.png and /images/2023/photo.jpg. /api/* matches all your API routes.
The ordering is crucial. CloudFront is lazy in the best way possible; it stops at the first matching pattern. So if you have a behavior for /api/secret/* and another for /api/*, you better put the /api/secret/* one first in the list. If the broader /api/* comes first, it will hog all the traffic, including your secret paths, and your specific settings for those secrets will never get a chance to run. This is a classic “why isn’t my behavior working?” moment. I’ve done it. You’ll do it. We’ll laugh about it later.
Cache Policies: The “Should I Even Bother?” Check
Once a behavior matches a path, the Cache Policy is the first big decision point. It answers: “Should I go all the way back to the origin for this, or can I just serve a cached copy?”
This is where CloudFront got a massive upgrade. We used to have to manually futz with headers, min/max TTLs, and query strings in a bizarrely complex settings screen. Now, we have managed policies. Use them. CachingOptimized is your best friend for static content. It ignores everything in the URL except the query strings you explicitly tell it to care about (via your Origin Request Policy, more on that next). CachingDisabled is for when you absolutely, positively must hit the origin every single time, like for those highly dynamic API responses.
But the real power is in the custom policies. You can get surgical. Let’s say you have a page that’s personalized via a cookie, but the core content is the same for everyone. A dumb cache would see the session cookie and create a unique cache key for every user, obliterating your cache hit ratio. A smart cache policy lets you say, “Ignore the sessionid cookie for caching purposes, but do forward it to the origin (via the Origin Request Policy) so the app can still personalize the response.” This is the secret sauce for caching personalized pages.
Here’s how you’d attach a managed policy using the AWS CLI when creating a new distribution. Doing this in the console is a point-and-click adventure, but this shows you the moving parts.
aws cloudfront create-distribution \
--distribution-config file://dist-config.json
And your dist-config.json would include a cache behavior section like this:
"CacheBehaviors": {
"Quantity": 1,
"Items": [
{
"PathPattern": "images/*",
"TargetOriginId": "my-s3-origin",
"ViewerProtocolPolicy": "redirect-to-https",
"CachePolicyId": "658327ea-f89d-4fab-a63d-7e88639e58f6" # This is the ID for 'CachingOptimized'
}
]
},
Origin Request Policies: The “What Should I Ask For?” Filter
Okay, so the Cache Policy said “go to the origin.” The Origin Request Policy now dictates what request actually gets sent. This is your chance to strip stuff out or add stuff in before the request even leaves the CloudFront edge.
The classic use case is query strings. Your origin might need ?language=en or ?version=2 to render the correct content, but for caching, you handled that in the Cache Policy. The Origin Request Policy ensures that only the query strings you care about get sent. Why waste bandwidth sending ?utm_source=twitter to your origin if your app doesn’t use it? Strip it out here.
It also handles headers. Maybe your origin application expects a custom X-API-Key header that the viewer (the browser) should never see or know about. You can have CloudFront add that header to every request it sends to the origin. This keeps your secrets secret and your origin happy.
# This would be part of your Cache Behavior config
"OriginRequestPolicyId": "88e93c6c-33ab-4c9a-9c56-afc3345b4e34", # Example ID for 'AllViewerExceptHostHeader'
The Deadly Embrace of Policies and Patterns
Here’s the trap: these settings are locked in a deadly embrace. If your Cache Policy is set to CachingDisabled, your Origin Request Policy becomes the only request that ever goes to your origin. But if you have a Cache Policy that does cache, the Origin Request Policy only affects requests that miss the cache and are sent to the origin. Wrapping your head around this interplay is key. It means a bug in your Origin Request Policy might hide for days until your cache expires, making it a nightmare to debug. Test with CachingDisabled first to make sure your request shaping works, then turn caching on.