14.7 S3 Object Lambda: Transforming Data On the Fly During GET

Right, so you’ve got your data sitting in S3. It’s pristine, it’s perfect. But then the requests start rolling in. “Can we get this CSV file as JSON?” “I need this image as a WebP, not a PNG.” “Can we redact the personally identifiable information (PII) from this document before my user sees it?”

The old, tedious way would be to create a whole ETL pipeline: trigger a Lambda on upload to transform the object into every possible format, store them all, and then hope you guessed right what the user would need. It’s wasteful, it’s expensive, and it’s frankly a bit daft. It’s like cooking every item on the menu the second a customer walks in, just in case they order it.

S3 Object Lambda is here to save you from that madness. The concept is brilliantly simple: instead of fetching the object directly from its bucket, you intercept the GET request (and also HEAD and LIST requests) and send it through a Lambda function you write. Your function fetches the original object, does whatever you need to it on the fly, and returns the transformed data back to the user. The user thinks they’re just asking S3 for an object, but you’ve inserted your own logic right in the middle. Magic.

How It Actually Works: The Request Flow

Don’t just think of it as a Lambda function; think of it as a man-in-the-middle for your S3 GET operations. Here’s the play-by-play:

A user or application requests an object using the standard S3 GetObject API call, but they use a special Object Lambda Access Point ARN instead of a regular bucket or access point ARN.
S3, upon seeing this special ARN, immediately routes the request to your configured Lambda function. It doesn’t even check if the original object exists first—that’s now your {{< bibleref “Job 3 ” >}}. AWS invokes your Lambda function, passing it a JSON payload that contains all the juicy details about the original request, most importantly the name of the original bucket and the object key.
Your function uses the provided credentials in the payload to call s3:GetObject on the original bucket to fetch the raw data. This is critical: your Lambda needs s3:GetObject permission on the original bucket.
You transform the data. This is where you do your work—convert the image, transform the JSON, redact the text, you name it.
Your function returns the transformed data in the specific response format AWS expects, and S3 Object Lambda sends it back to the original caller.

The caller is none the wiser. As far as they’re concerned, they asked S3 for an object and got it. They have no idea your Lambda function just did a full gymnastics routine in the background.

Writing Your Lambda Function: The Nuts and Bolts

The function signature is straightforward. You’ll get an event that looks like this, and you need to return a specific response.

{
  "xAmzRequestId": "1a2b3c4d-5e6f-7g8h-9i0j-k1l2m3n4o5p6",
  "getObjectContext": {
    "inputS3Url": "https://my-bucket.s3.us-east-1.amazonaws.com/my-key?X-Amz-Security-Token=...",
    "outputRoute": "proxy-response",
    "outputToken": "a1b2c3d4e5f6..."
  },
  "configuration": {
    "accessPointArn": "arn:aws:s3-object-lambda:us-east-1:123456789012:accesspoint/my-olap",
    "supportingAccessPointArn": "arn:aws:s3:us-east-1:123456789012:accesspoint/my-ap"
  },
  "userRequest": {
    "url": "https://my-olap-123456789012.s3-object-lambda.us-east-1.amazonaws.com/my-key",
    "headers": {"Accept": "*/*", "User-Agent": "aws-cli/2.0.0"}
  },
  "userIdentity": {
    "type": "AssumedRole",
    "principalId": "PRINCIPAL_ID",
    "arn": "arn:aws:sts::123456789012:assumed-role/Admin/my-user",
    "accountId": "123456789012"
  },
  "protocolVersion": "1.00"
}

Your job is to use that inputS3Url (which is a pre-signed URL to fetch the original object) and then return your transformed data. Here’s a Python example that simply converts text to uppercase, because all your users clearly want to YELL.

import urllib3
import json

http = urllib3.PoolManager()

def lambda_handler(event, context):
    # 1. Get the pre-signed URL to fetch the original object
    s3_url = event['getObjectContext']['inputS3Url']
    
    # 2. Fetch the original object from S3
    response = http.request('GET', s3_url)
    original_data = response.data.decode('utf-8')
    
    # 3. Transform the data (the fun part)
    transformed_data = original_data.upper()
    
    # 4. Return the response in the required format
    return {
        'statusCode': 200,
        'body': transformed_data,
        'headers': {
            'Content-Type': 'text/plain'
        }
    }

The Sharp Edges and “Gotchas”

This is powerful, but it’s not a free lunch. Here’s what the marketing page might not scream about:

Cold Starts are a User’s Problem: A cold start on your Lambda isn’t just a metrics blip anymore; it’s added latency for your end-user waiting for their object. Keep your runtime lean and your functions warm if low latency is critical.
You Pay for the Transformation AND the Fetch: Your Lambda pays for its execution time and memory. But remember, it also has to call s3:GetObject on the original bucket. You’re paying for that data transfer out of the source bucket and into your Lambda. It’s not double-dipping, but it’s two separate charges.
Error Handling is Your Responsibility: If the original object doesn’t exist, your Lambda has to handle that and return a proper 404 error. If your transformation logic fails mid-stream, you have to handle that gracefully. S3 won’t do it for you. Bad code here doesn’t just throw CloudWatch errors; it returns HTTP 500s to your users.
It’s for GET (and HEAD and LIST), Not WRITE: This is a read-only mechanism. You can’t use it to transform data on its way into the bucket (for that, you’d use normal S3 Event Notifications). The designers made a clear choice here, and it’s the right one—mixing read and write transforms would be a nightmare.

When To Use This (And When To Run Away)

This is perfect for:

Dynamic Format Transcoding: CSV<>JSON, XML<>JSON, image format conversion (e.g., PNG to WebP).
Personalization or Redaction: Masking PII based on the user requesting it, adding watermarks to images for specific users.
Legacy Modernization: Adding compression (e.g., on-the-fly gzip) for old clients that can’t request it themselves, or lightly modifying data structures for backward compatibility.

Avoid it for:

Extremely large objects: Your Lambda has a 15-minute timeout and limited memory. Trying to transform a 50GB file is a recipe for failure and a huge bill.
Simple, static transformations: If an object is transformed once and then read a million times, just do the transformation once on upload and be done with it. It’s far cheaper.
Writing data back to S3: That’s not what it’s for. Use a separate pipeline.

The presigned URL (inputS3Url) is your key. It’s valid for only 60 seconds and is the most secure way for your Lambda to access the original object without you having to manage convoluted IAM permission passthrough. Use it.

So, next time you find yourself thinking about pre-computing and storing a dozen variants of the same data, stop. Ask yourself: “Could I just do this on the fly?” Chances are, with S3 Object Lambda, you can.