21.1 Redshift Architecture: Leader Node, Compute Nodes, and Slices
Right, let’s get under the hood. You can’t effectively use Redshift—or troubleshoot its special brand of weirdness—without understanding its architecture. It’s not some magical black box; it’s a collection of machines with specific jobs, and when you know who does what, the whole system makes a lot more sense. Forget the marketing fluff; we’re here to talk about the actual metal and software.
At its core, a Redshift cluster is a shared-nothing MPP (Massively Parallel Processing) database. This is a fancy way of saying it’s a team of computers working together on one problem, and no single computer shares its memory or disk with the others. They have to talk over the network. Your cluster has two types of players: the Leader Node and the Compute Nodes.
The Conductor: Leader Node
Think of the Leader Node as the conductor of an orchestra. It doesn’t play the instruments, but it’s utterly essential. When your client application connects to Redshift, it’s talking exclusively to the Leader Node. This node’s job is to take your SQL query, figure out the optimal way to execute it across all the worker bees (the Compute Nodes), and then compile the results from them to send back to you.
It handles all the client communication, parses and optimizes the queries, develops the massive parallel execution plan, and coordinates the Compute Nodes. It also stores all the metadata—the schema information, the table definitions—in its own dedicated data store. The crucial thing to remember: The Leader Node does not store any of your actual user data. Not a single byte. If your query plan involves a 20TB table, the Leader Node will figure out how to process it, but it won’t process it itself. This is a common point of confusion.
The Workhorses: Compute Nodes
These are the muscle. Each Compute Node is a separate machine (often with its own CPU, RAM, and attached storage) that holds a slice of your data and performs the actual computation. When the Leader Node devises its execution plan, it sends the instructions to the Compute Nodes, which then get to work in parallel. The more Compute Nodes you have, the more parallel processing power you have. It’s that simple.
Redshift offers different node types (like ra3.xlplus or dc2.large), which primarily dictate the amount of compute (vCPUs, memory) and storage (whether it’s local or uses a managed RA3 storage layer) each node has. Choosing the right node type is a balance of your performance needs and your budget.
The Secret Sauce: Slices
Here’s where it gets interesting. Each Compute Node is subdivided into virtual chunks called slices. The number of slices per node is fixed and determined by the node type (e.g., most node types have 2 slices per node). Each slice gets a portion of the node’s CPU and memory resources.
Why slices? Because they allow for even more parallelism within a single machine. A node with 2 slices can execute two operations concurrently. When data is distributed across a cluster, it’s not just distributed across nodes; it’s distributed across all the slices in the entire cluster.
This is critical for understanding data distribution styles. If you have a 4-node cluster, each with 2 slices, you have 8 slices total. If you distribute a table on EVEN, Redshift will split the data into 8 chunks and put one chunk on each slice. The distribution key you choose determines which slice a given row of data gets assigned to.
-- Let's see this architecture in action. Run this on any table.
-- This shows you which slice (and therefore which node) a specific row lives on.
SELECT "$slice_id" AS slice_number,
"$node_id" AS node_id,
user_id -- or your distribution key column
FROM your_table
LIMIT 5;
The Pitfalls You’ll Actually Hit
- Leader Node Bottleneck: The most common architecture-related performance killer. If your query involves a lot of processing on the Leader Node—like a massive
COUNT(DISTINCT), a window function without a properPARTITION BY, or a gnarlyCASEstatement—it can’t be parallelized. The Leader Node, a single machine, has to do all the work itself after the Compute Nodes send it the data. This will crush your performance. You can spot these in the query plan (EXPLAIN) by looking for keywords likeXN Aggregateorrows=1. - Data Skew: If you choose a bad distribution key (like a column with very low cardinality, e.g., a
gendercolumn with only ‘M’/‘F’), you’ll end up with data skew. One slice will have a huge chunk of data (all the ‘F’ records), and the others will sit nearly empty. The query will only be as fast as the slowest, most overloaded slice. The system is parallel, but the work is uneven. - Network Hop: The Compute Nodes have to talk to each other to join tables or broadcast data. If you join a table distributed on
user_idwith a table distributed ondate_id, Redshift has to rearrange the data across the network so rows that need to be joined end up on the same slice. This is called a redistribution step (DS_BCASTorDS_DIST_ALLin the explain plan) and it’s expensive. It’s why choosing the right distribution keys for your join patterns is a non-negotiable best practice.
The designers made a solid choice with this MPP model—it’s proven tech for analytics. The questionable choice was initially tying storage so tightly to the compute nodes, making resizing a painful operation. The newer RA3 node type, which separates storage and compute, is their admission of this and a much better model. So, when you write a query, remember: you’re not talking to one database. You’re giving a speech to an entire parliament of computers, and the Leader Node is your translator. Write your queries to keep every member of parliament busy.