18.5 Cycle Detection in Recursive CTEs (PostgreSQL 14+)

Right, so you’ve built a recursive CTE to traverse a tree or a graph. It’s beautiful. It’s elegant. It works perfectly on your test data. You deploy it. A week later, your phone explodes at 3 AM because some smart aleck added a loop in the data, and your perfect query is now spinning in an infinite recursion, burning through CPU cycles like a crypto miner on a free electricity plan.

This, my friend, is why we have cycle detection. And as of PostgreSQL 14, we don’t have to hand-roll this tedious and error-prone safety check anymore. The database engine will do it for us, and it’s about time.

How the CYCLE Clause Actually Works

Think of the CYCLE clause as your query’s built-in immune system. It remembers every path it’s taken and slaps a big “DO NOT ENTER” sign on any path that would force it to re-tread old ground. Here’s the syntax. It looks a bit verbose at first, but you’ll learn to love its explicitness.

WITH RECURSIVE employee_paths AS (
    -- Anchor member: Find the top-level boss
    SELECT
        id,
        name,
        manager_id,
        ARRAY[id] AS path,  -- We start the path with the initial ID
        1 AS depth,
        false AS is_cycle   -- Initially, no cycle
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive member: Walk down the tree
    SELECT
        e.id,
        e.name,
        e.manager_id,
        ep.path || e.id,  -- Append the new employee's ID to the path
        ep.depth + 1,
        (e.id = ANY(ep.path)) AS is_cycle  -- The manual, pre-PG14 way
    FROM employees e
    JOIN employee_paths ep ON e.manager_id = ep.id
    WHERE NOT ep.is_cycle  -- Filter out cycles manually
)
SELECT * FROM employee_paths;

The old way, shown above, was a pain. You had to build an array (path) to track your journey, manually check if the new ID was in that array (e.id = ANY(ep.path)), and then use that boolean to filter out subsequent rows. It works, but it’s clunky and it’s on you to get it right.

PostgreSQL 14 said, “Nah, we got this.” Behold:

WITH RECURSIVE employee_paths AS (
    SELECT
        id,
        name,
        manager_id,
        1 AS depth
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    SELECT
        e.id,
        e.name,
        e.manager_id,
        ep.depth + 1
    FROM employees e
    JOIN employee_paths ep ON e.manager_id = ep.id
)
CYCLE id SET is_cycle USING path
SELECT * FROM employee_paths;

The magic is in that CYCLE id SET is_cycle USING path line. Here’s the translation:

CYCLE id: “Hey PostgreSQL, use this id column to uniquely identify a row for the purposes of cycle detection.”
SET is_cycle: “Create a new boolean output column called is_cycle. It will be true for any row that represents a cycle.”
USING path: “To track the path you’re taking, create an implicit array column called path. You don’t have to manage it; I’ll handle it.”

The engine internally does what we used to do manually. It tracks the path of id values. When processing a new row in the recursive term, it checks if the new id is already in the accumulated path. If it is, it marks is_cycle as TRUE and, crucially, does not use that row as a basis for further recursion. It’s a dead end. This is the key—it doesn’t just identify cycles; it prevents infinite recursion by stopping at them.

Why Your Column Choice for CYCLE Matters

You might think, “I’ll just CYCLE id everywhere and call it a day.” Not so fast. The column you specify must be the one that uniquely defines a node in the context of the graph you’re traversing. The id column (a primary key) is usually a perfect choice.

But consider a more complex graph. What if you’re traversing a table of flight segments between airports?

WITH RECURSIVE flight_paths AS (
    SELECT
        departure_airport,
        arrival_airport,
        ARRAY[departure_airport] AS path
    FROM flights
    WHERE departure_airport = 'JFK'

    UNION ALL

    SELECT
        f.departure_airport,
        f.arrival_airport,
        fp.path || f.arrival_airport
    FROM flights f
    JOIN flight_paths fp ON f.departure_airport = fp.arrival_airport
)
CYCLE arrival_airport SET is_cycle USING path
SELECT * FROM flight_paths;

Spot the problem? If you CYCLE arrival_airport, you’re saying a cycle occurs only if you land at the same airport twice. But what if you have a route JFK -> ATL -> LAX -> ATL? You’ve landed at ATL twice, which is a cycle. This works.

But what if your data uses airport codes that aren’t unique per airport? Or if your graph nodes are defined by a combination of columns? You must CYCLE on a set of columns that, together, form a unique identifier for a node in your specific graph. Sometimes you might even need to create a synthetic key for this purpose. The engine isn’t psychic; it can only detect cycles based on the exact values you tell it to care about.

The Invisible, Implicit PATH Column

The USING path part of the clause creates a hidden column. It’s not truly hidden—you can SELECT * and see it—but it’s implicit in the CTE’s definition. This is brilliantly convenient. You don’t have to declare it in the anchor member, you don’t have to manage the array appending logic in the recursive member. It’s all done for you.

This is a huge win for readability and maintenance. The query intent—the actual traversal logic—is no longer obfuscated by boilerplate cycle-detection code. The CYCLE clause declaratively states your intent, and the engine handles the imperative junk under the hood. This is how modern SQL should work.

Best Practices and the One Big “Gotcha”

First, the practice: Always use it. There is virtually no downside. The performance overhead is minimal compared to the cost of an infinite loop. It’s your cheapest insurance policy.

Now, the “gotcha”: The CYCLE clause only prevents infinite recursion within the CTE itself. It will not save you from a recursive CTE that is simply processing a very large, but acyclic, graph. If your company’s reporting hierarchy has 500,000 employees, your recursive CTE will still build a massive result set and potentially consume vast amounts of memory. Cycle detection stops loops; it doesn’t replace careful query design with LIMIT, WHERE clauses, or other filters to keep your result set sane. Don’t confuse cycle detection with a license to be reckless with your graph sizes.