Now, let’s get our hands dirty with the ast module. If you’ve ever wondered how linters, auto-formatters, or sophisticated refactoring tools work their magic, this is the secret sauce. They don’t use regular expressions on source code—that way lies madness. Instead, they parse the code into an Abstract Syntax Tree (AST), a structured, tree-like representation of your program’s syntax.

Think of it like this: your code is a string of words. The AST is the diagram that a linguist would draw to show the subject, verb, object, and all the clauses. The ast module is our linguist.

The Nuts and Bolts: Nodes and the NodeVisitor

The core of the ast module is, unsurprisingly, a collection of node classes. There’s an ast.Add node for the + operator, an ast.Name node for variable names, an ast.Call node for function calls, and so on. The ast.parse() function is your gateway. Feed it a string of code, and it returns the root node of the AST (an ast.Module object by default).

import ast

code = """
def greet(name):
    return f"Hello, {name}!"
"""

tree = ast.parse(code)
print(ast.dump(tree, indent=4))

This will output a detailed tree, showing you the Module body containing a FunctionDef named ‘greet’, which has an arguments node and a Return node that contains a JoinedStr (for the f-string)… you get the idea. ast.dump() is your best friend for inspecting these structures. It’s like print() for the AST, showing you the entire hierarchy.

To actually do something with this tree, you don’t just traverse it manually. You use a NodeVisitor. This class provides a clean, callback-based way to interact with the nodes. You write methods called visit_<NodeType> (e.g., visit_Name), and the visitor will call them for every node of that type in the tree.

class FunctionCallCounter(ast.NodeVisitor):
    def __init__(self):
        self.count = 0

    def visit_Call(self, node):
        # This gets called for every function call in the AST.
        self.count += 1
        # Crucial: call self.generic_visit to continue traversing child nodes.
        self.generic_visit(node)

counter = FunctionCallCounter()
counter.visit(tree)
print(f"Number of function calls: {counter.count}")

The Power and Peril of ast.literal_eval

You’ve probably been told a million times: “Never use eval()!” It’s a massive security risk. The ast module gives us a much safer alternative for one very common use case: parsing data structures. ast.literal_eval() can evaluate a string containing a Python literal structure (strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None) safely.

import ast

unsafe_string = "[1, 2, 3 + 4]"  # This has an expression!
safe_string = "[1, 2, 3]"

# This will work. It's just a list of literals.
safe_result = ast.literal_eval(safe_string)
print(safe_result)  # [1, 2, 3]

# This will raise a ValueError. The '+' operator is not allowed.
try:
    unsafe_result = ast.literal_eval(unsafe_string)
except ValueError as e:
    print(f"Safety first! {e}")

Why is it safe? Because the entire AST is walked to ensure it only contains nodes that constitute a literal structure. No function calls, no attribute lookups, no name references. Nothing that could possibly execute arbitrary code. It’s the perfect tool for the job when you need to get a data structure from a string and you can’t trust the source enough for the full eval().

Modifying the Tree: The NodeTransformer

Here’s where things get truly powerful. A NodeVisitor just lets you look. A NodeTransformer lets you change the AST. You override the same visit_<NodeType> methods, but now you can return a new node to replace the old one, or return None to remove the node entirely.

Let’s say you want to change all integer literals 1 to 42 for some reason. (Don’t ask why, it’s my example.)

class NumberChanger(ast.NodeTransformer):
    def visit_Num(self, node):
        if node.n == 1:
            # Return a new Num node with the value 42
            return ast.Num(n=42)
        # Return the original node for all other numbers
        return node

transformer = NumberChanger()
new_tree = transformer.visit(tree)

# Let's see what our new function looks like
print(ast.unparse(new_tree))
# Outputs:
# def greet(name):
#     return f'Hello, {name}!'

Wait, what? It didn’t change. That’s because our original greet function didn’t have the number 1 in it. Let’s fix the example.

code_with_numbers = """
def calculate():
    answer = 1 + 1
    return answer
"""

tree = ast.parse(code_with_numbers)
new_tree = NumberChanger().visit(tree)
print(ast.unparse(new_tree))
# Now outputs:
# def calculate():
#     answer = 42 + 42
#     return answer

There we go. This is the fundamental mechanic behind tools like 2to3 for modernizing Python code. They use a NodeTransformer to find old, deprecated patterns and replace them with new ones.

The Gotchas: Location Matters

The AST does not preserve every single bit of formatting from your original source code. Comments, exactly how many spaces you used, and the specific quoting style for strings are generally lost. This is why it’s an abstract syntax tree. If you need to modify code and preserve its style 100% perfectly (like a formatter does), you need a more complex tool like the lib2to3 module or a third-party library like redbaron.

Another critical point: when you modify an AST, you are responsible for its correctness. The ast module will help you create nodes, but it won’t stop you from building complete nonsense. If you create a new AST from scratch or heavily modify one, always compile it to bytecode first to catch syntax errors before you try to run it.

new_tree = ast.parse("some invalid code") # This is fine, it just parses.
ast.fix_missing_locations(new_tree) # Fills in lineno/col_offset attributes
compiled_code = compile(new_tree, '<string>', 'exec') # This will fail if the AST is nonsensical.

This compile step is your final, crucial sanity check. Never skip it. It’s the difference between a clever metaprogramming trick and a runtime error that makes you question all your life choices.