37.8 CloudFormation Guard: Policy Validation for Templates

Right, so you’ve written a CloudFormation template. It’s a thing of beauty. It deploys an entire fleet of microservices, a couple databases, and probably a sentient AI for all I know. You’re feeling pretty good about yourself. But let me ask you a question: are you sure that EC2 instance isn’t wide open to the entire internet? Did you remember to enforce encryption on that S3 bucket? Or did you just build a beautifully orchestrated, automated, multi-tier security vulnerability?

This, my friend, is where CloudFormation Guard (cfn-guard) comes in. Think of it as your extremely pedantic, rule-obsessed best friend who reads over your infrastructure code before you deploy it and says things like, “Um, actually, company policy forbids launching t2.nano instances because they’re objectively silly.” It’s a policy-as-code tool that lets you define a set of rules (in a surprisingly readable language) that your templates must pass. This isn’t a suggestion; it’s a hard stop. And it’s one of the best things you can add to your deployment pipeline.

How It Works: Rules Are Not Suggestions

At its core, cfn-guard is a command-line tool that takes two inputs: your CloudFormation template (JSON or YAML) and a set of rules you define in a .guard file. It then parses your template, checks it against every single rule, and gives you a simple pass/fail report. If it fails, it tells you exactly which resource and which rule broke the deal.

The magic is in the rule language. It’s not some arcane JSON schema; it’s a DSL built for this specific job. Let’s say you have a company-wide mandate that all S3 buckets must have encryption enabled. The rule for that is almost English:

# rules/s3-encryption.guard
rule s3_bucket_encryption_at_rest when
    %aws_s3_bucket.resources[*] {
        Properties {
            # BucketEncryption must exist
            BucketEncryption !empty
            BucketEncryption {
                # ServerSideEncryptionConfiguration must exist and be a list
                ServerSideEncryptionConfiguration !empty
                ServerSideEncryptionConfiguration[*] {
                    # And the first element must have SSEAlgorithm set to AES256 or aws:kms
                    ServerSideEncryptionByDefault {
                        SSEAlgorithm == /AES256/ or SSEAlgorithm == /aws:kms/
                    }
                }
            }
        }
    }
}

Now, let’s run it against a template that, ahem, forgot this crucial detail.

# bad-template.yaml
Resources:
  MyDataLake:
    Type: AWS::S3::Bucket
    Properties:
      # Look, no encryption! A classic mistake.
      AccessControl: Private

You’d run cfn-guard validate -t bad-template.yaml -r rules/s3-encryption.guard and get a beautifully direct failure message:

FAILED rules
s3_bucket_encryption_at_rest: DEFAULT for Resource [MyDataLake] and Property [Properties] is [VIOLATION] because value at [BucketEncryption] is [null] which is not empty and required. The value is required to be not empty.

See? It’s not mysterious. It points directly at the resource MyDataLake and the missing BucketEncryption property. No more digging through CloudFormation events after a failed stack creation; you catch this nonsense on your local machine before it even gets near an AWS API.

Writing Effective Rules: Beyond the Basics

The real power comes from combining checks. You can write rules that are terrifyingly specific. Want to ensure that any security group attached to an RDS instance only allows traffic from specific IP ranges on port 5432? You can do that.

rule rds_security_groups_are_restrictive when
    %aws_rds_dbinstance.resources[*] {
        Properties {
            # If VPCSecurityGroups are defined, check them
            VPCSecurityGroups !empty
            VPCSecurityGroups[*] !empty
        }
    }
then {
    %aws_ec2_securitygroup.resources[*] {
        # Check that the security group's Ingress rules are tight
        Properties {
            SecurityGroupIngress !empty
            SecurityGroupIngress[*] {
                # Only allow from our office IP range on port 5432
                IpProtocol == "tcp"
                FromPort == 5432
                ToPort == 5432
                CidrIp == "192.0.2.0/24"
            }
        }
    }
}

Notice the when and then? This is a conditional rule. It only triggers the check on security groups (then) for resources that are RDS instances (when). This is how you move from simple property checks to complex, cross-resource governance.

Common Pitfalls and Sharp Edges

First, the syntax is powerful but can be finicky. Checking for the absence of something is a common trip-up. You can’t just say “this shouldn’t exist.” You have to use a clause like some_property != /Pattern/ or, more effectively, write a rule that triggers when a resource exists and then fails if a forbidden property is present.

Second, remember that cfn-guard is a static checker. It can’t validate things that are determined at runtime, like whether a specific IAM action exists or if a resulting ARN is correct. It checks what’s written in the template, period.

The biggest “gotcha” is in how it handles lists and loops. The [*] notation is a wildcard that means “for every element in this list.” You have to be deliberate about it. If you write SecurityGroupIngress[*].CidrIp, you’re checking the CidrIp field for every single ingress rule. This is what you want for a blanket ban on 0.0.0.0/0.

The best practice? Integrate this directly into your CI/CD pipeline. Make a cfn-guard check a mandatory gate on any pull request that changes infrastructure. This turns your pipeline from “does it deploy?” to “does it deploy and is it compliant?” It shifts security and governance left, which is the only sane way to operate at scale. It refuses to be boring about enforcing the rules, and so should you.