70.5 AST Transformations and Code Generation
Right, so you’ve made it to the part where we stop just looking at the code and start rewriting it from the inside. This is where we graduate from clever tricks to something that feels a bit like wizardry—powerful, dangerous, and liable to blow your own foot off if you’re not careful. We’re going to talk about taking the Abstract Syntax Tree (AST) we just learned to introspect and using it to generate or transform code.
The core idea is simple: if code is just data (a tree of nodes), then writing code is just constructing that data. And if you can construct that data programmatically, you can generate code. If you can modify existing data, you can transform code. This is the engine behind linters, advanced optimizers, and frameworks that use a lot of “magic” to make your life easier (or more confusing, depending on the day).
Generating Code with the ast Module
Let’s start by generating code from scratch. Why would you do this? Maybe you’re building a domain-specific language (DSL) that compiles to Python, or creating a tool that outputs boilerplate code. The ast module provides classes for every type of node, so you can build a tree by instantiating them.
Let’s generate a simple function, def add_one(x): return x + 1, purely through AST construction.
import ast
import inspect
# Build the function definition node for 'add_one'
function_name = "add_one"
arg_name = "x"
# Create the function arguments: a single argument named 'x'
args = ast.arguments(
posonlyargs=[],
args=[ast.arg(arg=arg_name)],
vararg=None,
kwonlyargs=[],
kw_defaults=[],
kwarg=None,
defaults=[]
)
# Create the function body: a single Return statement
# The value being returned is a BinOp: x + 1
return_stmt = ast.Return(
value=ast.BinOp(
left=ast.Name(id=arg_name, ctx=ast.Load()),
op=ast.Add(),
right=ast.Constant(value=1)
)
)
# Assemble the FunctionDef node
function_def = ast.FunctionDef(
name=function_name,
args=args,
body=[return_stmt], # body is a list of statements
decorator_list=[],
returns=None
)
# Put the function inside a Module node so it can be compiled
module = ast.Module(body=[function_def], type_ignores=[])
# This is CRITICAL: The tree we built has no line numbers or column offsets.
# The compiler needs these. ast.fix_missing_locations adds them.
ast.fix_missing_locations(module)
# Compile the AST into a code object
code_obj = compile(module, filename='<ast>', mode='exec')
# Now, execute the code object to bring the function into existence
exec(code_obj)
# Prove it works
print(add_one(41)) # Output: 42
See? We just wrote code that wrote code. It’s a bit verbose, but it’s incredibly precise. The ast.fix_missing_locations part is a classic foot-gun—forget it, and you get a useless ValueError about missing line information. The designers made you do this manually because automatically adding dummy locations could mask real bugs if you were modifying an existing tree.
Modifying the AST In-Place
Generation is cool, but transformation is where the real power is. Let’s say you want to be that person and write a decorator that logs every time a function is entered and exited. You could do this by wrapping the function, but for the sake of example, let’s surgically insert the logging calls directly into its AST.
import ast
import inspect
def inject_logging(func):
# Get the source and parse it into an AST
source = inspect.getsource(func)
tree = ast.parse(source)
# We're assuming it's a simple function with a single FunctionDef node.
# This is a massive oversimplification for a tutorial. In reality,
# you'd need to find the correct node, handle classes, etc.
function_node = tree.body[0]
# Create AST nodes for logging statements
log_entry = ast.Expr(
value=ast.Call(
func=ast.Name(id='print', ctx=ast.Load()),
args=[ast.Constant(value=f"Entering {func.__name__}")],
keywords=[]
)
)
log_exit = ast.Expr(
value=ast.Call(
func=ast.Name(id='print', ctx=ast.Load()),
args=[ast.Constant(value=f"Exiting {func.__name__}")],
keywords=[]
)
)
# Insert them at the beginning and end of the function body
function_node.body.insert(0, log_entry)
function_node.body.append(log_exit)
# Fix locations, compile, and execute the new definition
ast.fix_missing_locations(tree)
code_obj = compile(tree, filename='<ast>', mode='exec')
exec(code_obj)
# Return the newly created function from the new local scope
return locals()[func.__name__]
# Let's use it
@inject_logging
def say_hello(name):
print(f"Hello, {name}!")
say_hello("Alice")
# Output:
# Entering say_hello
# Hello, Alice!
# Exiting say_hello
This is, frankly, a terrifying and brittle approach for a decorator (inspect.getsource has severe limitations), but it illustrates the concept perfectly. A real-world tool like a linter or a framework like pytest uses this exact methodology, just with infinitely more robust node traversal and error handling.
The Giant, Glaring Pitfalls
- Security: You are using
exec(). I don’t care what you’re doing, just stop and say it out loud: “I am usingexec().” It should make you feel a little uneasy. Never, ever compile and execute an AST from an untrusted source. It is a straight-up remote code execution vulnerability. - Debugging: When your generated or transformed code breaks, the tracebacks are a special kind of hell. The line numbers often point to the
compile()call, not your original source. You’re on your own. - Maintenance: This is “here be dragons” territory. The code is hard to read and reason about. The
astmodule itself is notoriously under-documented; you’ll often find yourself comparing theast.dump()of real code to reverse-engineer how to build a node. - Python Version Compatibility: The AST changes between Python versions. A node structure that works in 3.8 might be completely broken in 3.9. If you’re building tools this low-level, you need to pin your version or be prepared for a world of pain.
So, should you use this? For a one-off code generation script or a specialized internal tool? Absolutely. It’s the right tool for that job. For something you plan to ship to millions of users? You’d better have a world-class team and a very good reason. It’s the kind of power that separates the novices from the experts, and the experts from the people who get paged at 3 AM.