14.8 S3 Batch Operations: Processing Millions of Objects at Scale

Right, so you’ve got a few million objects sitting in a bucket. Maybe you need to change their storage class, add tags, or copy them to another bucket. You’re not going to do that by hand, are you? Of course not. You’re going to fire up S3 Batch Operations, which is essentially your personal robot army for S3 object management. It’s the tool you use when a simple aws s3 sync just won’t cut the mustard and you’d rather not write a bespoke Lambda function to handle the sheer scale.

The beauty of Batch Ops is its simplicity. You give it a list of objects (a “manifest”), tell it what action to perform on each one, and then you go get a coffee while it handles the rest. It manages all the concurrency, error handling, and reporting for you. It’s a “set it and forget it” operation, provided you set it up correctly. And that’s the part we need to talk about.

The Three Pillars of a Batch Job

Every Batch Operation rests on three core components: the Manifest, the Action, and the Report.

First, the Manifest. This is a CSV file (or an S3 Inventory report) that tells Batch Ops exactly which objects to process. You can’t just point it at a bucket and say “go nuts.” You have to be specific. The manifest must include the bucket name, object key, and optionally a version ID if you’re dealing with versioned buckets. You store this manifest… wait for it… in an S3 bucket. Meta, I know.

Here’s the required format for that CSV. Save this as my-manifest.csv:

bucket,key,versionId
my-source-bucket,project-a/file1.txt,null
my-source-bucket,project-b/image.jpg,null
my-source-bucket,important-doc.pdf,abc123def456ghi789jkl012mn345op67

Second, the Action. This is the actual task you want to perform on every object in your manifest. The most common ones are:

s3PutObjectCopy: Copy objects to another bucket (even cross-region, though Replication might be better for ongoing needs).
s3PutObjectTagging: Add or replace object tags.
s3InitiateRestoreObject: Restore objects from Glacier Flexible Retrieval.
s3SetObjectAcl: Modify object ACLs. (But you’re using bucket policies, right? Right?)

Third, the Report. Batch Ops will generate a detailed completion report—also stored in a bucket you specify—that tells you which objects succeeded, which failed, and why. This is non-negotiable. Always, always enable it. The last thing you want is a million-object job failing silently on object #42 and you having no clue.

A Real-World Example: The Great Glacier Thaw

Let’s say you got a little overzealous with lifecycle policies and moved a few terabytes of infrequently accessed data into S3 Glacier Flexible Retrieval. Now you need it all back. Manually restoring each file via the console would be a special kind of hell. Batch Ops to the rescue.

First, create your manifest of objects in Glacier. An S3 Inventory report is perfect for this. Then, you’d use the AWS CLI to create the job. Notice how we specify every detail: the manifest, the action, the report, and, crucially, the IAM Role that Batch Ops will assume to perform the work.

# Create a JSON file (restore-action.json) defining the action
{
  "S3InitiateRestoreObject": {
    "ExpirationInDays": 7,
    "GlacierJobTier": "BULK"
  }
}

# Create a JSON file (manifest.json) pointing to our CSV manifest
{
  "Spec": {
    "Format": "S3InventoryReport_CSV_20211130",
    "SourceBucket": "my-inventory-bucket"
  }
}

# Create the Batch Job using the AWS CLI
aws s3control create-job \
    --account-id YOUR_ACCOUNT_ID \
    --role-arn arn:aws:iam::YOUR_ACCOUNT_ID:role/S3BatchOpsRole \
    --manifest file://manifest.json \
    --operation file://restore-action.json \
    --report '{
      "Bucket": "arn:aws:s3:::my-report-bucket",
      "Format": "Report_CSV_20180820",
      "Enabled": true,
      "Prefix": "batch-reports",
      "ReportScope": "AllTasks"
    }' \
    --priority 10 \
    --description "Restore all Glacier objects from inventory report" \
    --region us-east-1

The Devil in the Details: Pitfalls and Best Practices

This is where I earn my keep. Batch Ops is powerful, but it will happily shoot you in the foot if you let it.

IAM is Everything: The most common point of failure. The IAM role you specify must have permission to read the manifest, perform the action on every target object, and write the completion report. If your role can’t tag an object, the entire job fails for that object. Test your role’s permissions rigorously.
Idempotency is Key: Most Batch Ops actions are idempotent, meaning running them multiple times has the same effect as running them once. This is good. But be wary of actions that aren’t. For example, if your copy job fails halfway and you restart it, will you end up with duplicates? Plan for this.
Cost and Speed: You can set a priority from 0-999. Higher priority jobs cost more but run faster. For a massive, non-urgent job, save your money and set a low priority. AWS will still churn through it, just at a more leisurely pace.
The “Null” Version: See that null in the versionId column? That’s not a suggestion; it’s the required value for objects that do not have a version ID. If you leave it blank or put NONE or something else, the job will fail for that entry. It’s pedantic, but so is software.
Check the Report, You Maniac: I said it before, but it’s worth repeating. The completion report is your only source of truth. A job can show “Completed” in the console but still have thousands of failed tasks due to permission errors. Always download and scan the report for errors. Your robot army is powerful, but it’s also very literal. It will do exactly what you tell it to, even if that’s the wrong thing.