55.9 Performance: Compiled Patterns and Catastrophic Backtracking
The Nature of Regex Engine Execution To understand performance, one must first grasp how regex engines operate. Most modern engines, including those in Python, Java, and .NET, are backtracking engines. They work by attempting to match a pattern from left to right, one character at a time. When the engine encounters a point in the pattern where multiple paths to a match are possible (e.g., a quantifier like * or +, or an alternation with |), it chooses one path and remembers the others as “backtracking positions.” If the chosen path ultimately leads to a failed match, the engine backtracks to the last saved position and tries the next alternative. This process continues until a match is found or all possibilities are exhausted. While powerful, this approach is fundamentally susceptible to inefficiency if the number of possible paths explodes exponentially.