36.4 X-Ray Sampling Rules: Controlling Trace Volume
Right, let’s talk about sampling. You’ve enabled X-Ray, and suddenly your trace data is… a lot. Like, “could-fund-a-small-nation’s-coffee-supply” a lot. That’s because by default, the X-Ray daemon tries to sample one request per second and five percent of additional requests. It’s a decent starting point, but it’s about as subtle as a sledgehammer. For high-throughput services, this default can generate a staggering, expensive, and frankly useless volume of traces. You don’t need a trace for every single health check or load balancer ping. This is where sampling rules come in—they’re your finely-tuned control panel for this firehose of data.
The core idea is brilliantly simple: you define a set of rules, each with a priority, a rate (between 0% and 100%), and a set of conditions to match requests. When a request comes in, X-Ray evaluates these rules in priority order (highest number wins) and applies the first rule that matches. It’s a classic priority-based rule engine, and you’re in charge.
The Anatomy of a Sampling Rule
Let’s break down what you’re actually defining. A sampling rule has three main components:
- Priority: An integer (e.g., 100, 90, 10). Higher numbers equal higher priority. This is crucial because if a request matches two rules, the one with priority 100 wins over the one with priority 90. You use this to ensure your most important rules aren’t accidentally overridden by broader, catch-all ones.
- Rate: The percentage of matching requests you want to sample. A rate of 0.05 means 5%, 1.0 means 100%. This is where you control the volume.
- Conditions: Attributes of the incoming request you can filter on. The big ones are:
- ServiceName: The logical name of your service (e.g.,
MyAwesomePaymentService). - ServiceType: The type of resource (e.g.,
AWS::ECS::Container,AWS::Lambda::Function). - Host: The hostname from the HTTP request.
- HTTPMethod: GET, POST, PUT, etc.
- URLPath: The path of the request (e.g.,
/api/users/*). Yes, you can use wildcards. - Fixed Target: This one is a bit weird. It’s a legacy holdover that lets you specify a minimum number of requests to sample per second before the rate applies. I’ll be honest, I almost never use it. The rate-based control is almost always what you want.
- ServiceName: The logical name of your service (e.g.,
Writing Your First Rule (The Code Part)
Enough theory. Let’s say you have a payment API endpoint that’s critical. You want to trace 100% of POST requests to /api/payment, but for everything else, you’re happy with a meager 1% sample rate. Here’s how you’d define that using the AWS CLI.
First, create a file named high-priority-rule.json. This is your crucial payment rule.
{
"RuleName": "PaymentApiRule",
"Priority": 100,
"FixedRate": 1.0,
"HTTPMethod": "POST",
"URLPath": "/api/payment",
"ServiceName": "MyAppBackend",
"ResourceARN": "*",
"ServiceType": "*",
"Host": "*",
"Attributes": {}
}
Next, create a low-priority-rule.json for the “catch-everything-else” scenario.
{
"RuleName": "DefaultLowRateRule",
"Priority": 10,
"FixedRate": 0.01,
"HTTPMethod": "*",
"URLPath": "*",
"ServiceName": "MyAppBackend",
"ResourceARN": "*",
"ServiceType": "*",
"Host": "*",
"Attributes": {}
}
Now, create them using the AWS CLI. Notice the order doesn’t matter; the Priority field is what dictates evaluation order.
aws xray create-sampling-rule --cli-input-json file://high-priority-rule.json
aws xray create-sampling-rule --cli-input-json file://low-priority-rule.json
And just like that, you’ve gone from chaotic default sampling to a targeted, intelligent strategy. You can view your active rules with:
aws xray get-sampling-rules
The “Reservoir” Pitfall and the Default Rule
Here’s the first “gotcha.” Remember that default sampling behavior I mentioned? Well, AWS creates a default rule named Default for you with a priority of 10000. Yes, you read that right. A priority of ten thousand. This means it will trump any rule you create with a priority below that, which is basically all of them unless you specifically try to beat it.
This default rule uses that legacy “Fixed Target” reservoir I warned you about. It’s designed to ensure you always get some data, but it can completely mess with your carefully planned rates for low-traffic services. The best practice? You must manage the default rule. Either update it to a very low fixed rate (like 0.01) or, my preferred method, delete it entirely once you have your own rules in place. Don’t let this silent priority-10000 assassin undermine your plans.
# Get the rule name and version to delete it
aws xray delete-sampling-rule --rule-name Default --region us-east-1
Best Practices from the Trenches
- Start Conservative: Begin with a very low global rate (1-5%). It’s much easier to increase the rate because you need more data than it is to explain to your finance department why you spent thousands on traces last month.
- Use High Priority for Errors: One of the best uses of high-priority rules is to sample a much higher rate (or even 100%) of requests with HTTP 5xx status codes. Errors are often the most important thing to debug, and you can’t debug what you don’t trace.
- Rule Limits: Be aware you can only have 100 sampling rules per account/region. This sounds like a lot until you have a microservices architecture. You might need to use broader wildcards or manage rules more centrally.
- It’s Not Just for Lambdas: While serverless folks think of this first, these rules are incredibly powerful for controlling trace volume from EC2, ECS, or EKS workloads where the X-Ray daemon is running on every host, potentially spamming your trace intake.
The goal isn’t to trace everything. The goal is to trace smartly. Sampling rules are how you stop paying for noise and start paying for signal.