29.3 Task States: Calling Lambda, ECS, DynamoDB, and Other Services
Alright, let’s talk about the real workhorses of Step Functions: Task states. This is where your state machine stops just drawing pretty pictures and actually does something—like calling a Lambda function, poking an ECS task, or writing to a DynamoDB table. Think of it as the state machine’s way of outsourcing the actual labor.
The core idea is beautifully simple. You define a resource—like the ARN of a Lambda function—and you hand it some input. The service does its thing, and its output becomes the state’s output, which then gets passed along to the next state. It’s the “do work” box in your flowchart.
The Basics: Calling a Lambda Function
This is the most common task you’ll write, so let’s get the syntax right. The key is the Resource field. For Lambda, it uses a special ARN format that tells the Step Functions service, “Hey, go call this Lambda function for me.”
Here’s a state definition that calls a function named DataEnrichmentFunction.
{
"Enrich Data": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:us-east-1:123456789012:function:DataEnrichmentFunction",
"Payload": {
"input.$": "$"
}
},
"Next": "NextState",
"Retry": [
{
"ErrorEquals": ["Lambda.ServiceException", "Lambda.AWSLambdaException"],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Catch": [
{
"ErrorEquals": ["States.TaskFailed"],
"Next": "HandleTaskFailure",
"Comment": "The Lambda function failed beyond our retry policy"
}
]
}
}
Notice the Parameters block. This is a critical piece of AWS wizardry. You’re not passing the input directly to the Lambda API; you’re configuring the invocation of that API. The Payload parameter inside this block becomes the actual event your Lambda function receives. The ."$" is a reference path telling it to use the entire state input as the payload. This trips up everyone at first—myself included. You’re not just calling a function; you’re orchestrating an API call.
Beyond Lambda: The Power of Service Integrations
Now, why stop at Lambda? Step Functions can talk directly to a host of other AWS services—over 200 actions at last count—using optimized integrations. This is a huge deal. It means you can call services like DynamoDB or ECS without the ridiculous overhead of writing a trivial Lambda function just to act as a glorified API wrapper. It’s faster, cheaper, and you have one less piece of code to manage.
For example, let’s say you want to put an item directly into a DynamoDB table. You can do that without a Lambda middleman.
{
"Write to DynamoDB": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:putItem",
"Parameters": {
"TableName": "MySuperImportantTable",
"Item": {
"UserId": {
"S.$": "$.user.id"
},
"Data": {
"S.$": "$.processedData"
}
}
},
"Next": "NextState"
}
}
See the pattern? The Resource ARN now points to dynamodb:putItem. The Parameters block is a direct mapping to the PutItem API call. This is incredibly powerful. You’re essentially writing API calls in JSON, and Step Functions handles the authentication and execution for you. The . reference paths let you pluck data right from your state input to populate the API request. It’s workflow nirvana.
The Brutal Honesty: Error Handling is Non-Negotiable
Here’s where I stop being your brilliant friend and start being your slightly annoyed, experienced one who has been burned too many times. YOU MUST HANDLE ERRORS IN YOUR TASK STATES. The AWS console might make it look optional, but it is a trap for the unwary.
Without a Retry or Catch policy, any service exception—a Lambda timeout, a DynamoDB throttling error, a momentary blip—will cause your entire execution to fail. This is the default behavior. It’s like designing a car that explodes if it hits a pothole. You wouldn’t do that. So don’t do it here.
The Retry block is for transient errors you expect to succeed eventually (throttling, timeouts). The Catch block is for when those retries are exhausted or for fundamental, non-retryable errors (like a “function not found” error). Define them. Use them. Your production system will thank you. The example above shows a robust policy for a Lambda call.
The Sharp Edges: Payload Size and Timeouts
Now for the rough edges I promised to call out. There are limits, and they will bite you.
The total character count for a state’s input, output, or result (the data passed between states) is 256 KB. If your Lambda function returns a 300 KB JSON object, your task state will fail with a States.ExceededPayloadSizeLimit error. You must design your workflows to be mindful of this. Pass references (e.g., an S3 object key), not the entire data payload.
Furthermore, a Task state has a maximum timeout of one year (yes, really), but the services it calls do not. A Lambda invocation has a 15-minute max. An ECS task might run for days. You need to set the TimeoutSeconds field in your task state to something sensible that aligns with the service you’re calling. If you set a 5-minute timeout on a Task state calling Lambda, and your Lambda is configured with a 10-minute timeout, the Step Functions execution will fail at the 5-minute mark, even though the Lambda is still chugging away. This kind of misconfiguration is a classic “why did my workflow fail?” headache. Align your timeouts.