28.6 Lifecycle Policies: Automatically Expiring Old Image Tags

Right, so you’ve got an ECR repository filling up with old container images. It happens to the best of us. You push v1.2.3, then v1.2.4, and before you know it, you’ve got three gigabytes of image layers from builds you haven’t thought about in six months clogging up your AWS bill. Manually deleting these is a tedious, error-prone nightmare. This is where lifecycle policies come in—they’re the automated janitor for your container attic.

Think of a lifecycle policy as a set of rules you define. ECR evaluates these rules, and based on what you tell it, it automatically expires (a polite euphemism for “deletes forever”) images that match your criteria. It’s a set-it-and-forget-it solution, which is fantastic, but you must set it correctly. A bad policy is like a janitor who throws out your production servers because they looked dusty.

How the Rules Actually Work

The magic, and the occasional frustration, lies in how ECR evaluates your policy. It doesn’t just run down your list of rules in order. It uses a rule evaluation sequence that prioritizes what to keep, not what to delete. This is a critical mental model shift.

ECR first looks for any rules that tag an image as immutable. Any image matched by a tagStatus of tagged and a tagPrefixList you’ve defined in a rule with countType set to imageCountMoreThan and a countNumber of 1 is permanently shielded from all other rules. It’s getting a “do not delete” order. After that, it applies your expiration rules to everything else. This means your “keep the last 5 images” rule will always be overridden by an immutable rule. This is by design to prevent catastrophic mistakes, and you should be grateful for it.

Crafting Your Policy JSON

The policy itself is a JSON document. You can write it by hand, but I always use the AWS CLI to get a starter template. It’s just easier.

aws ecr get-lifecycle-policy --repository-name my-cool-app --query 'lifecyclePolicyText' --output text

If you don’t have one, it’ll error, which is fine. Here’s a robust, example policy you can adapt. Let’s break it down.

{
  "rules": [
    {
      "rulePriority": 10,
      "description": "Keep the last 10 prod images",
      "selection": {
        "tagStatus": "tagged",
        "tagPrefixList": ["prod-"],
        "countType": "imageCountMoreThan",
        "countNumber": 10
      },
      "action": {
        "type": "expire"
      }
    },
    {
      "rulePriority": 20,
      "description": "Expire untagged images older than 7 days",
      "selection": {
        "tagStatus": "untagged",
        "countType": "sinceImagePushed",
        "countUnit": "days",
        "countNumber": 7
      },
      "action": {
        "type": "expire"
      }
    },
    {
      "rulePriority": 30,
      "description": "Keep max 500 images overall as a safety net",
      "selection": {
        "tagStatus": "any",
        "countType": "imageCountMoreThan",
        "countNumber": 500
      },
      "action": {
        "type": "expire"
      }
    }
  ]
}

The rulePriority field is key: lower numbers get evaluated first. Our first rule (priority 10) says “if there are more than 10 images tagged with prod-, start deleting the oldest ones until we’re back to 10.” The next rule (20) is my favorite: it nukes any untagged image after a week. Untagged images are usually byproducts of failed builds or tests and are almost always garbage. This rule alone will save you a shocking amount of space. The final rule (30) is a catch-all safety net. No matter what, don’t let this repository grow beyond 500 images. It’s your parachute.

The Gotchas and Rough Edges

Now, the honest part. This system is good but not clairvoyant.

It’s eventual consistency. When you push a new image, it might take a few hours for the lifecycle policy to run and clean up the old ones. Don’t panic if you don’t see immediate storage reduction.
tagPrefixList is exact. If you use ["prod"], it will match the tag prod but NOT prod-v1. You need ["prod-"] for that. This trips everyone up.
Deletion is permanent. There is no “oops” button or trash can. Test your policies on a dummy repository filled with fake images first. I’m not kidding. aws ecr list-images and aws ecr describe-images are your friends for testing what will be selected.
It works on manifests, not tags. This is the most important technical nuance. When you delete an image through a lifecycle policy, you are deleting the image manifest. If multiple tags point to that exact same manifest (e.g., you have latest and v1.0.0 pointing to the same hash), deleting one will remove all tags for that manifest. The policy doesn’t “expire tags”; it expires images, and the tags are just labels that fall away with them.

The best practice? Be deliberate. Your first rule should almost always be to protect what you absolutely need. Use immutable tags for your true north production images. Then, aggressively target untagged cruft. Finally, use a general count-based rule as a backstop. This approach is cautious, effective, and won’t have you frantically trying to rebuild a two-year-old Docker image because a overly enthusiastic policy scrubbed it from existence.