19.5 DynamoDB Streams: Change Data Capture for Lambda and Analytics

Right, so you’ve got your DynamoDB table humming along, faithfully storing your data. But what happens next? Your application isn’t a museum; data changes, and other parts of your system need to know about it. You could poll the table constantly, asking “Has anything changed? How about now? Now?” but that’s the technical equivalent of a backseat driver and a fantastic way to burn through your read capacity. Enter DynamoDB Streams, which is basically DynamoDB tapping you on the shoulder and handing you a note that says, “Hey, here’s exactly what just happened.”

Think of a Stream as an ordered, immutable flow of information about every modification to your table items. When you enable a stream on a table, every PUT, UPDATE, and DELETE operation pumps a record into this stream. This record contains the core of what you need: the item’s key and, crucially, the item’s state before and after the change. This is your golden ticket for building reactive, event-driven architectures.

What’s in a Stream Record?

Don’t just take my word for it; let’s look at the raw, unvarnished JSON that gets shoved into the stream. It’s not the entire item by default, but it gives you everything you need to go get it.

{
  "eventID": "1",
  "eventVersion": "1.1",
  "eventSource": "aws:dynamodb",
  "awsRegion": "us-east-1",
  "eventName": "MODIFY",
  "dynamodb": {
    "Keys": {
      "UserId": {
        "S": "123"
      }
    },
    "NewImage": {
      "UserId": {
        "S": "123"
      },
      "GameScore": {
        "N": "1050"
      }
    },
    "OldImage": {
      "UserId": {
        "S": "123"
      },
      "GameScore": {
        "N": "1000"
      }
    },
    "SequenceNumber": "111",
    "StreamViewType": "NEW_AND_OLD_IMAGES"
  }
}

The magic is in the dynamodb object. You get the Keys to identify the item, and depending on how you configure the stream, you get different views. The StreamViewType is a critical choice you make when you set up the stream:

KEYS_ONLY: Just the key attributes. Useful if you only need to know what changed and will fetch the full item yourself.
NEW_IMAGE: The entire item as it appears after the change.
OLD_IMAGE: The entire item as it appeared before the change.
NEW_AND_OLD_IMAGES (shown above): Both. This is the most powerful option, as it lets you see the delta without any extra calls. It’s also the most expensive in terms of write capacity because you’re effectively storing two copies of the item in the stream.

Hooking It Up to Lambda

The most common and elegant way to consume a stream is by attaching an AWS Lambda function. AWS manages the connection, the polling, and the batching of records for you. It handles retries and can even restart the stream if your function falls behind. It’s shockingly well-designed.

Here’s a dead-simple Lambda function in Python that processes a batch of stream records. Notice it’s not dealing with individual events but a list of them.

import json

def lambda_handler(event, context):
    for record in event['Records']:
        # Parse the DynamoDB-specific part of the record
        dynamodb_data = record['dynamodb']
        event_name = record['eventName']  # e.g., INSERT, MODIFY, REMOVE

        # Example: Only process updates where the 'Score' changed significantly
        if event_name == 'MODIFY':
            old_score = int(dynamodb_data['OldImage']['Score']['N'])
            new_score = int(dynamodb_data['NewImage']['Score']['N'])

            if new_score > old_score + 100:
                user_id = dynamodb_data['Keys']['UserId']['S']
                print(f"User {user_id} just leveled up! ({old_score} -> {new_score})")
                # ... do something awesome like send a notification

    return {'statusCode': 200}

The Gotchas: Idempotency and Ordering

Here’s where the “brilliant friend” part kicks in: you will screw this up if you don’t understand these two concepts.

Ordering is per partition key. This is vital. Records for a specific partition key are delivered to your Lambda function in the exact order of the sequence of operations on that item. This means for user 123, you’ll always see UPDATE -> UPDATE -> DELETE in that order. However, records for different partition keys (123, 456, 789) can be processed in any order and by different Lambda instances simultaneously. There is no global table-wide order.

Idempotency is your responsibility. Because Lambda uses a polling model and has at-least-once delivery, you might get the same stream record more than once, especially after a timeout or error. Your function logic must be idempotent—meaning processing the same event multiple times should have the same effect as processing it once. Using the eventID as a unique identifier to check if you’ve already handled this specific change is a classic and robust pattern.

Why This is a Killer Feature

Beyond just sending emails, Streams unlock the real power of DynamoDB.

Analytics: Stream the changes directly into Amazon Kinesis Data Firehose, which can dump them into S3, Redshift, or Elasticsearch for near-real-time analytics.
Aggregations: Maintain a materialized view or a rolling counter in another table. Every time an order status changes to SHIPPED, increment a DailyShippedOrders counter.
Search Indexing: Fan out changes to a full-text search service like Elasticsearch to keep your search index perfectly in sync with your primary data store.
Audit Logs: With OLD_IMAGE and NEW_IMAGE, you have a perfect, immutable audit trail of every single change to your data.

It transforms your database from a passive store into the beating heart of your application’s event bus. You stop asking your database what happened and start listening when it tells you.