31.3 Kinesis Client Library (KCL) and Lambda Trigger Integration

Right, so you’ve got your Kinesis Data Stream humming along, shoveling data records like there’s no tomorrow. The next question is the fun one: how do you actually consume this firehose without building a complex, state-managing, shard-balancing monster of a service? You’ve got two primary flavors: run the Kinesis Client Library (KCL) yourself on a fleet of EC2 instances, or let AWS do the heavy lifting with a Lambda trigger. I’m going to assume you’re here because you prefer “less servers” to “more servers,” so let’s dive into the Lambda integration. It’s brilliant, but it has its own… idiosyncrasies.

How the Lambda Trigger Actually Works (It’s Not What You Think)

Don’t make the mistake of thinking Lambda is polling your stream. That would be inefficient and, frankly, beneath it. Instead, the stream invokes your function. Here’s the magic: behind the scenes, AWS runs a managed instance of the KCL for you. This “Lambda poller” takes care of all the soul-crifyingly complex bits like shard leasing, checkpointing, and dealing with resharding. It fetches records from the stream in batches and then hands them off to your Lambda function in a nice, wrapped-up payload.

Your function is invoked per shard, roughly speaking. The poller will hold a lease on a shard and keep feeding it to your function. The beautiful part? Checkpointing is handled automatically. When your function successfully processes a batch and returns, the poller checkpoints progress. If your function errors, it doesn’t checkpoint, and the poller will retry that same batch of records. This is the core of its reliability.

The Event Payload: Your Batch of Records

When your Lambda is invoked, it receives an event object that contains an array of records. Each record is a base64-encoded blob of your data, because Kinesis doesn’t assume your data is text (even though 99% of us are just shoving JSON in there). You’ll need to decode it.

Here’s a quick and dirty Python example to show you the shape of things:

import base64
import json

def lambda_handler(event, context):
    for record in event['Records']:
        # Kinesis data is base64 encoded
        payload = base64.b64decode(record['kinesis']['data'])
        # Assuming it was JSON encoded UTF-8 string data
        data = json.loads(payload.decode('utf-8'))
        
        # Now go do something actually useful with 'data'
        print(f"Processed record: {data}")
        
        # Imagine this could fail here, which would trigger a retry
        # process_record(data)

The event also contains the sequence number and the approximate arrival timestamp, which are invaluable for debugging and idempotency.

Tuning the Firehose Nozzle: Critical Configuration

This is where most people mess up. The default settings are, to put it kindly, conservative. You must adjust these or you’ll either be leaving performance on the table or drowning in throttling errors.

Batch Size: This is the maximum number of records Lambda will pack into a single invocation. The default is 100. If your records are small, crank this up to 10,000. You’re paying per invocation, so making each one do more work is cost-effective.
Batch Window: The maximum amount of time Lambda spends gathering records before invoking your function. If you don’t hit your Batch Size, it will wait up to this long (max 300 seconds) to form a batch. This is great for low-throughput streams where you’d rather wait a few seconds to get a decent-sized batch rather than process one or two records at a time.
Starting Position: LATEST to start at the end of the stream, or TRIM_HORIZON to start from the oldest available record. This is crucial for replaying data.

You set this in the Event Source Mapping, which is the resource that glues the stream to your function (defined in SAM, CloudFormation, or Terraform). Here’s a sane, performant SAM example:

MyKinesisTrigger:
  Type: AWS::Serverless::Function
  Properties:
    CodeUri: path/to/code/
    Handler: app.lambda_handler
    Policies:
      - AWSLambdaKinesisExecutionRolePolicy
    Events:
      MyStream:
        Type: Kinesis
        Properties:
          Stream: !GetAtt MyKinesisStream.Arn
          BatchSize: 1000
          MaximumBatchingWindowInSeconds: 5
          StartingPosition: LATEST

The Dark Arts: Error Handling and Retry Behavior

This is the most important part to understand. Lambda’s retry behavior is both a safety net and a potential footgun.

Your Function Throws an Error: The entire batch fails. The event source mapping will retry the same batch of records. And again. And again. It will retry until it either succeeds or the records expire from the stream’s retention period (which is 24 hours to 365 days!). This can quickly create a retry storm that blocks processing on that shard.
Your Function is Throttled (Lambda returns 429): The poller will back off and retry.
Your Function Times Out: This is treated as an error, and the batch will be retried.

The best practice? Your function must be idempotent. Processing the same record twice should be a no-op. Because I promise you, it will happen, especially during deployments or transient downstream failures. Also, implement dead-letter queues (DLQs) on the event source mapping for records that consistently fail. This kicks the truly poisonous records out of the main flow so they stop blocking everything else.

The Elephant in the Room: Cold Shards and Slow Consumers

Here’s the bit the documentation glosses over. The managed KCL poller assigns shards to its own worker instances. If you have a stream with 10 shards, you might have 10 concurrent Lambda invocations. But if you suddenly add 10 more shards (e.g., via resharding), the poller needs a moment to spin up new workers to claim them. During this time, those new “cold” shards aren’t being processed. It’s usually a delay of tens of seconds, but you need to be aware of it. Conversely, if your function is too slow and can’t keep up with the iterator age of a shard, AWS might reassign that shard to another poller worker, which can also cause a brief hiccup. Monitor your IteratorAge metric in CloudWatch like a hawk; it’s the single best indicator of consumer health. If it’s constantly growing, you’re falling behind.