42.7 Trusted Advisor: Cost, Security, Fault Tolerance, and Performance Checks

Right, let’s talk about Trusted Advisor. This is the part where I get to be the nagging, slightly paranoid friend in your ear, but the one who’s almost always right. AWS has a million services, and it’s trivial to leave a metaphorical door unlocked, a storage bucket wide open, or—the real killer—a massive instance running for a project you finished six months ago. Trusted Advisor is the system that automatically checks for these “oh crap” moments on your behalf.

Think of it as a team of over-caffeinated, pedantic robots constantly auditing your account against a huge list of AWS best practices. They check for things across five pillars: cost optimization, performance, security, fault tolerance, and service limits. The catch? The really good, specific checks—the ones that actually save you from a headline-making security incident—are locked behind a Business or Enterprise support plan. The free tier checks are… well, they’re better than nothing, but just barely. It’s AWS’s way of giving you a free sample of the good stuff.

The Core Checks: What It Actually Looks For

Log into the console, go to Trusted Advisor, and you’ll see a dashboard color-coded like a traffic light: green for “no problem,” red for “action needed,” and yellow for “investigate this.” Let’s break down what it’s actually hunting for.

Cost Optimization: This is where it pays for itself (literally). It flags underutilized EC2 instances that are begging to be downsized or terminated, idle Load Balancers that are just burning money, and unattached EBS volumes—the digital equivalent of leaving a storage unit paid for but completely empty. It also hunts down reserved instances that you’re not fully utilizing. The savings here are often shockingly large.

Performance: This checks for overutilized instances (time to scale up!), high-latency CDN nodes, and cloudfront misconfigurations. It’s less about immediate cash savings and more about making sure your users aren’t staring at a spinning wheel.

Security: This is the big one. It will find S3 buckets with world-read or world-write permissions, which is a spectacularly bad idea 99.9% of the time. It checks for IAM keys that haven’t been rotated in over 90 days, exposed access keys, and security groups with overly permissive rules (like allowing SSH from 0.0.0.0/0 if you’re not actually using it).

Fault Tolerance: This identifies things like EC2 instances running in a single Availability Zone, lack of multi-AZ for RDS, or low backup retention periods. It’s asking, “If an AWS data center gets swallowed by a sinkhole, how hosed are you?”

Service Limits: AWS has soft limits on everything from the number of VPCs you can have to how many EC2 instances you can run. This check warns you when you’re approaching 80% of a limit, so you can file a request to increase it before your auto-scaling group fails to launch a new instance and takes your entire application down.

Automating the Paranoia: Using the AWS CLI

You’re not going to sit there refreshing the console all day. The real power is automating this. You can use the AWS CLI to programmatically fetch recommendations and hook them into your monitoring or ticketing system. Here’s how you’d list all cost optimization recommendations:

aws support describe-trusted-advisor-check-result \
    --check-id eW7HH0l7J9 \  # This is the ID for the 'Cost Optimization - Amazon EC2' check
    --language en

The output is a glorious, terrifying JSON blob detailing every single one of your offending resources. The tricky part is that each check category has its own unique, opaque check ID. You have to list the checks first to get these IDs.

# Get a list of all available checks and their IDs
aws support describe-trusted-advisor-checks \
    --language en

Once you have the data, you can parse it with jq and send alerts to a Slack channel or create Jira tickets automatically. For example, to find all underutilized EC2 instances:

aws support describe-trusted-advisor-check-result \
    --check-id eW7HH0l7J9 \
    --language en | \
jq -r '.result.flaggedResources[] | select(.metadata[2] | contains("Underutilized")) | .metadata[1]'

The Rough Edges and Pitfalls

Trusted Advisor is brilliant, but it’s not clairvoyant. Its biggest weakness is context. It knows an EBS volume is unattached, but it doesn’t know if you’re deliberately keeping it around for a forensic analysis next week. It will still flag it. You have to be the one to apply judgment. Never blindly act on its recommendations without understanding the “why” behind the resource.

Also, the refresh times can be… leisurely. Checks don’t run in real-time. There can be a several-hour delay between you fixing an issue and the dashboard updating. Don’t panic if it doesn’t turn green immediately.

The most important best practice? Review it regularly. Make it part of a weekly or monthly ritual. Automate the fetching of critical security findings—treat those like a five-alarm fire. For cost optimization, schedule a monthly finance review. This tool is only as good as the process you build around it. Ignore it, and it’s just a colorful dashboard. Use it, and it becomes your first line of defense against stupidity (both human and automated).