10.7 Lambda Pricing: Requests and GB-Seconds

Alright, let’s talk money. Or, more accurately, let’s talk about how AWS decides to bill you for the privilege of running your brilliant little snippets of code. It’s a surprisingly elegant model, but if you don’t understand its moving parts, you can get a nasty surprise on your monthly bill. It’s not magic; it’s just math. Let’s break it down so you’re the one in control.

AWS charges you for two things, and two things only: the number of times your function is invoked, and the total compute time it consumes. That’s it. No hourly fees for idle time, no complex licensing. You pay for the electrons as they spin.

The Two Pillars of the Bill

First, you pay per request. Every single time your function is triggered—be it by an API Gateway call, an S3 event, or you manually poking it in the console—it counts as a request. As of my last update, the first 1 million requests per month are free in the always-free tier, and then it’s a fraction of a cent per request after that. It’s usually the cheaper part of the bill unless you’re doing something truly absurd.

The second, and more important, charge is for duration. This is where most of the cost (and confusion) lies. Duration is measured in GB-seconds. Don’t let the unit scare you; it’s simpler than it sounds. Let’s dissect it:

GB: The amount of memory you allocated to your function.
Seconds: The time your function spent actually executing, rounded up to the nearest millisecond.

The cost is the product of these two numbers. Think of it as renting a car. The “request” is the cost of starting the ignition. The “GB-seconds” are the cost of the rental, which depends on the size of the car (memory) and how long you drive it (duration).

Why this model? Because it directly aligns AWS’s incentives with yours. They make money when your code runs, so they are highly motivated to make the startup and execution blisteringly fast. Their profit comes from efficiency, which is a game we all want to win.

A Concrete Example: Doing the Math

Let’s make this painfully real with some code. Say you have a function that processes an image. It’s allocated 2048 MB (2 GB) of memory and runs for 1.5 seconds.

import time

def lambda_handler(event, context):
    start_time = time.time()
    
    # Simulate some actual work, like processing an image
    time.sleep(1.5) # This is your 1.5 seconds of compute
    
    # Calculate the cost for this single invocation (for demonstration)
    duration_sec = time.time() - start_time
    allocated_memory_gb = context.memory_limit_in_mb / 1024
    gb_seconds = allocated_memory_gb * duration_sec
    
    print(f"Allocated Memory: {allocated_memory_gb:.2f} GB")
    print(f"Execution Time: {duration_sec:.3f} seconds")
    print(f"Total GB-Seconds: {gb_seconds:.6f}")
    
    return {"statusCode": 200}

For this single invocation:

Memory: 2 GB
Duration: 1.5 seconds
GB-Seconds: 2 GB * 1.5 s = 3 GB-s

Now, imagine this function is called 10 million times in a month. Let’s assume you’re past the free tier.

Request Cost: 10,000,000 requests * $0.20 per million requests = $2.00
Duration Cost: 10,000,000 invocations * 3 GB-s each = 30,000,000 GB-s. Convert GB-s to GB-hours (because that’s how it’s priced): 30,000,000 / 3600 ≈ 8,333.33 GB-hours. At, say, $0.0000166667 per GB-hour (the price for US East), that’s 8,333.33 * $0.0000166667 ≈ $138.89

Your total bill for this function would be roughly $140.89. See how the duration cost dominates? This is why optimizing your function’s execution time is your number one priority for cost savings.

The Gotchas: Where They Get You

The billing model has some sharp edges you need to be aware of.

1. The Round-Up Rule: Duration is rounded up to the nearest millisecond. A function that runs for 1001 milliseconds is billed for 1001 ms. But a function that runs for 1001.0001 ms is billed for 1002 ms. It’s tiny, but at scale, it adds up. More importantly, your function’s total time isn’t just your code’s runtime. It includes the Lambda Init Duration and the Invoke Overhead—the time the runtime spends bootstrapping and preparing to hand off to your handler. If you have a large dependency that’s imported outside your handler, you’re paying for that initialization time on every cold start.

2. Over-Provisioning Memory is a Double Whammy. This is the biggest mistake I see. More memory doesn’t just cost more per second; it also often gives you more CPU power, which can make your code run faster. It’s a trade-off. A function with 512 MB might run for 10 seconds (5.12 GB-s), while the same function with 2048 MB might run for 2 seconds (4.096 GB-s) because of the increased CPU. The latter is both faster and cheaper. You need to right-size your functions. Don’t just guess; use AWS’s Power Tuning tool to find the sweet spot for your specific workload.

3. The “Free” Tier is Per Account. The 1 million requests and 400,000 GB-seconds of compute time are per AWS account, per month. This is great for development and small projects, but if you have a team of 10 developers each running tests all day, you can burn through that free tier surprisingly quickly. Keep an eye on it.

The golden rule? Your code isn’t done when it works. It’s done when it’s fast. Shaving milliseconds off your execution time isn’t premature optimization; it’s directly putting money back in your pocket. Now go make something that costs pennies.