18.4 Walking Trees and Graphs with Recursive CTEs

Alright, let’s get our hands dirty with the real magic trick: recursive CTEs. This is where you stop politely asking the database for data and start teaching it how to think recursively. It feels like a superpower the first time you get it right, and honestly, it kind of is. We use these primarily for one brilliant, maddening task: walking hierarchical data stored in a table. You know, like org charts, category trees, bill-of-materials explosions, or social network graphs—situations where one row points to another row in the same table.

The core idea is beautifully simple, even if the implementation can make you want to gently headbutt your keyboard. You’re going to write a CTE that has two parts, united by a UNION ALL: the Anchor Member and the Recursive Member. The anchor is your starting point, the solid ground. The recursive member is the part that loops, repeatedly querying the results of its own previous iteration.

The Anatomy of a Recursive CTE

Let’s break down the syntax. It looks weird until it doesn’t.

WITH RECURSIVE EmployeeHierarchy AS (
    -- Anchor Member: The starting point(s)
    SELECT id, name, manager_id, 1 AS level
    FROM employees
    WHERE manager_id IS NULL -- Our big boss(es) at the top

    UNION ALL

    -- Recursive Member: This joins back to the *growing result set*
    SELECT e.id, e.name, e.manager_id, eh.level + 1
    FROM employees e
    INNER JOIN EmployeeHierarchy eh ON e.manager_id = eh.id
)
-- Final SELECT from the fully built CTE
SELECT * FROM EmployeeHierarchy;

Here’s the play-by-play:

The Anchor: The database runs the first SELECT. This grabs all the root nodes (e.g., employees with no manager). This forms the initial result set.
The Recursion: The database then runs the second SELECT. The critical part is INNER JOIN EmployeeHierarchy eh ON e.manager_id = eh.id. It’s not joining back to the original employees table; it’s joining back to the current working result of the CTE itself—which, for this first recursion, is just the anchor results. It finds all employees who report to the bosses we just found.
The Loop: The results from step 2 are added to the working set via UNION ALL. The database then takes this new, larger set and runs the recursive member again, now finding employees who report to the managers it just found. It will keep doing this, iterating deeper and deeper into the tree…
The Termination: …until one glorious iteration returns no new rows. The recursive member comes up empty. The loop stops. The final, complete result set is what you SELECT from at the end.

Why You Absolutely Need a Termination Condition

Notice in the example above, the termination condition is implicit: the recursion stops when the INNER JOIN finds no more matches. This is fine for a simple tree walk. But sometimes, you have to be explicit to prevent an infinite loop, which is the quickest way to turn your elegant query into a database-crashing party trick.

The most common way to force termination is to track the path and check for cycles. This isn’t just a best practice; for graph data, it’s a necessity.

WITH RECURSIVE CategoryPath AS (
    SELECT id, name, parent_id, ARRAY[id] AS path
    FROM categories
    WHERE parent_id IS NULL

    UNION ALL

    SELECT c.id, c.name, c.parent_id, cp.path || c.id
    FROM categories c
    INNER JOIN CategoryPath cp ON c.parent_id = cp.id
    WHERE c.id != ALL(cp.path) -- The magic: prevent cycles
)
SELECT * FROM CategoryPath;

The WHERE c.id != ALL(cp.path) clause checks that the new category ID we’re about to add isn’t already somewhere in the path we’ve taken to get here. If it is, we’ve hit a cycle (a category whose parent eventually points back to itself, which is a data quality nightmare someone else created but you have to handle). This condition blocks that row from being added, breaking the cycle and saving your query.

Practical Example: Finding an Entire Subtree

Let’s say you want every single person in a particular manager’s org chart, all the way down. The recursive CTE is your only sane tool for this job.

WITH RECURSIVE OrgChart AS (
    -- Anchor: Start with the manager in question
    SELECT id, name, title
    FROM employees
    WHERE id = 42 -- The ID of the manager whose tree we want

    UNION ALL

    -- Recursive: Find all direct reports, and their reports, and...
    SELECT e.id, e.name, e.title
    FROM employees e
    INNER JOIN OrgChart o ON e.manager_id = o.id
)
SELECT * FROM OrgChart;

This will return the manager (ID 42), everyone who reports to them, everyone who reports to those people, and so on, until it hits the individual contributors at the leaves of that branch.

Crucial Pitfalls and Best Practices

Performance is a Thing: This is not a free operation. Each recursion is essentially a new query. On massive trees, this can get expensive. Use UNION ALL (not UNION) unless you absolutely need deduplication, as the DISTINCT operation in UNION is a huge cost. Always filter aggressively in the recursive member if you can.
The Cycle Check is Non-Negotiable: If there’s even a whisper of a possibility for cycles in your data (and there always is, because data is messy), you must implement a cycle check using an array or a visited flag. Your future self, who isn’t explaining to a DBA why the production database locked up, will thank you.
You Can’t Reference the CTE Twice: A frustrating but logical limitation: within the recursive member, you can only reference the recursive CTE once. You can’t do a self-join on EmployeeHierarchy to perform complex logic. The workaround is often to push that complexity into the anchor or final SELECT.
Think in Sets, Not Loops: The most significant mental shift is understanding that the recursive member operates on the entire result set from the previous iteration, not on a single row at a time. It’s a set-based loop. Embrace it.

Once you master this, you’ll start seeing hierarchies everywhere. And you’ll have the power to unravel them with a few lines of SQL, which is a feeling that never gets old.