19.2 Global Secondary Indexes (GSI) and Local Secondary Indexes (LSI)
Right, let’s talk about indexes. You already know your table’s Primary Key is the main way you get at your data. But you’re not a simpleton; your queries are more sophisticated than “find user 42.” You want to “find all orders for user 42” or “find the top 10 most popular products.” This is where secondary indexes come in. They’re your way of telling DynamoDB, “Hey, I’m going to need to query this data in a different order, so do me a favor and maintain a second, hidden table for me, sorted this way.” It’s a fantastic feature, but like most powerful things, it comes with complexity and cost. Let’s break down the two types: Local and Global.
The Quick and the Expensive: Local Secondary Indexes (LSI)
An LSI is the simpler, more constrained cousin. It lets you define an alternate sort key for your table, but it must share the same partition key as the base table. Think of it as a different sorting of the items within a single partition.
Why would you want this? Imagine your base table has a primary key of UserID (partition) and OrderID (sort). Your main access pattern is “get all orders for a specific user.” But now you want to “get the 3 most recent orders for a specific user.” You can’t do that efficiently with just the OrderID; you need a timestamp. An LSI lets you project a OrderDate attribute as the alternate sort key for that same UserID partition.
The Crucial LSI Gotcha: You can only create an LSI at the same time you create the table. You cannot add one later, and you cannot remove one. This is, frankly, a bizarre and frustrating design choice by AWS. It means you have to perfectly predict your future access patterns on day one, which is a fool’s errand. Because of this, LSIs are falling out of favor. I almost never use them unless I have a rock-solid, never-changing query pattern.
Here’s how you’d define that table with an LSI using the AWS SDK for JavaScript (v3):
import { DynamoDBClient, CreateTableCommand } from "@aws-sdk/client-dynamodb";
const client = new DynamoDBClient({ region: "us-east-1" });
const command = new CreateTableCommand({
TableName: "UserOrders",
AttributeDefinitions: [
{ AttributeName: "UserID", AttributeType: "S" },
{ AttributeName: "OrderID", AttributeType: "S" },
{ AttributeName: "OrderDate", AttributeType: "S" }, // Must define for LSI
],
KeySchema: [
{ AttributeName: "UserID", KeyType: "HASH" },
{ AttributeName: "OrderID", KeyType: "RANGE" },
],
LocalSecondaryIndexes: [
{
IndexName: "UserOrdersByDateIndex",
KeySchema: [
{ AttributeName: "UserID", KeyType: "HASH" }, // Same partition key
{ AttributeName: "OrderDate", KeyType: "RANGE" }, // New sort key
],
Projection: {
ProjectionType: "ALL", // Projects all attributes into the index
},
},
],
BillingMode: "PROVISIONED",
ProvisionedThroughput: {
ReadCapacityUnits: 5,
WriteCapacityUnits: 5,
},
});
const response = await client.send(command);
The Flexible Workhorse: Global Secondary Indexes (GSI)
This is where the real power is. A GSI lets you define a completely new primary key for your table. It can have a different partition key and a different sort key. This is a game-changer because it enables entirely new access patterns that are impossible with the base table’s key.
Want to query all orders by their status? Your base table is partitioned by UserID, which is useless. But a GSI with a partition key of OrderStatus and a sort key of OrderDate lets you run a query like “find all SHIPPED orders from the last day.”
The GSI Trade-off: This flexibility comes with two main costs:
- Eventual Consistency: By default, writes to a GSI are asynchronous. Your base table write will succeed, and the index update will happen shortly after. This means a query against the GSI immediately after a write might not reflect the change. You can request strongly consistent reads, but they cost double and aren’t always supported for global tables.
- Double the Write Costs: Every write to your base table (1 Write Capacity Unit) will also perform a write to every GSI you have that includes that item (another 1 WCU per GSI). If you have two GSIs, a single table write consumes three WCUs. Yes, you read that right. It’s the number one way people accidentally nuke their AWS bill. Plan your capacity accordingly.
Let’s add that GSI for order status to our table. Notice we can do this after the table is created.
import { DynamoDBClient, UpdateTableCommand } from "@aws-sdk/client-dynamodb";
const client = new DynamoDBClient({ region: "us-east-1" });
const command = new UpdateTableCommand({
TableName: "UserOrders",
AttributeDefinitions: [
{ AttributeName: "OrderStatus", AttributeType: "S" }, // Define the new key attribute
],
GlobalSecondaryIndexUpdates: [
{
Create: {
IndexName: "OrdersByStatusIndex",
KeySchema: [
{ AttributeName: "OrderStatus", KeyType: "HASH" }, // Totally new PK
{ AttributeName: "OrderDate", KeyType: "RANGE" } // New SK
],
Projection: {
ProjectionType: "KEYS_ONLY", // Only projects the keys and attributes from the key schema
},
ProvisionedThroughput: { // You MUST provision capacity for the index itself!
ReadCapacityUnits: 5,
WriteCapacityUnits: 5,
},
},
},
],
});
const response = await client.send(command);
Projection: Paying for What You (Might) Use
When you create an index, you have to decide which attributes from the base table get copied over. This is called projection. Your choices are:
KEYS_ONLY: Only the primary key attributes of the base table and the new key attributes of the index. Super lean, cheapest for writes.INCLUDE: The keys plus any other non-key attributes you specify. A good middle ground.ALL: The entire item. Most convenient for reads (you don’t have toGetItemback to the main table later) but most expensive for writes.
Choose wisely. If you only need to use the index to find a list of primary keys, KEYS_ONLY is your best friend. If you need a few specific attributes, INCLUDE is perfect. Defaulting to ALL is a common, but often wasteful, anti-pattern.
So, the rule of thumb? Prefer GSIs for their flexibility. Use them to unlock new query patterns. But never, ever forget that every GSI is essentially a new table you’re paying to maintain with every single write. Design with your access patterns first, and let the indexes follow.