21.4 How Python Resolves Names at Compile Time vs Runtime
In Python, name resolution is governed by the LEGB rule, a fundamental concept that dictates the order in which the interpreter searches for a name’s value. LEGB stands for Local, Enclosing, Global, Built-in, representing the hierarchy of scopes Python checks. Crucially, this process involves two distinct phases: compile-time and runtime, each playing a different role in how names are bound and looked up.
Compile-Time: Bytecode Generation and Name Categorization
When a Python module is imported or a script is run, it is first compiled into bytecode. During this compilation phase, the Python interpreter analyzes the code statically (without executing it) to determine the scope of each name. It categorizes every variable reference as either local, global, or free. This categorization is permanently baked into the generated bytecode and dictates the opcode used to fetch the name’s value later at runtime.
The key distinction is made by the presence of an assignment. If a name is assigned a value anywhere within a function body (unless explicitly declared otherwise with global or nonlocal), the compiler treats all references to that name within the function as local. It does this regardless of the order of assignment or reference, which can lead to a common pitfall.
def compile_time_example():
print(my_var) # This will cause an error, even though my_var is defined later.
my_var = 10 # The assignment makes 'my_var' a LOCAL name for the entire function.
# compile_time_example() # Uncommenting this would raise: UnboundLocalError: local variable 'my_var' referenced before assignment
In the above example, the compiler sees the assignment my_var = 10 and categorizes my_var as a local variable for the entire compile_time_example() function scope. The bytecode for the print(my_var) statement is thus generated to use a LOAD_FAST opcode, which looks for the variable in the local namespace. At runtime, since the local my_var hasn’t been assigned a value when the print call is executed, an UnboundLocalError is raised.
Runtime: Executing Bytecode and Dynamic Lookup
Runtime is when the pre-generated bytecode is actually executed by the Python Virtual Machine (PVM). The PVM uses the opcodes embedded during compilation to perform the actual lookup within the appropriate namespace.
LOAD_FAST: Used for local names. The PVM looks for the name in the function’s local namespace (frame), which is typically implemented as a fast array for efficiency.LOAD_GLOBAL: Used for global and built-in names. The PVM searches the module’s global namespace and, if not found there, the built-in namespace (__builtins__).LOAD_DEREF: Used for free variables (names that are not local and not global, but are defined in an enclosing non-global scope). The PVM looks for these names in the cell object contained within the function’s closure.
my_global = "I'm global"
def runtime_example():
local_var = "I'm local"
print(local_var) # LOAD_FAST: Finds it in the local scope.
print(my_global) # LOAD_GLOBAL: Finds it in the module's global scope.
print(len) # LOAD_GLOBAL: Finds the built-in 'len' function.
# print(undefined_name) # LOAD_GLOBAL: Would raise NameError as it's not found anywhere.
runtime_example()
The Interaction of global and nonlocal
The global and nonlocal statements are directives to the compiler that change its default name categorization, directly influencing the opcodes generated.
global: Informs the compiler that the specified name belongs to the module’s global scope, even if it is assigned within the function. This changes the bytecode fromLOAD_FAST/STORE_FASTtoLOAD_GLOBAL/STORE_GLOBAL.
count = 0
def increment_global():
global count # Compiler: treat 'count' as a GLOBAL name
count += 1 # Bytecode uses STORE_GLOBAL and LOAD_GLOBAL
print(f"Global count is: {count}")
increment_global() # Output: Global count is: 1
nonlocal: Informs the compiler that the specified name belongs to the nearest enclosing scope (but not the global scope). This changes the bytecode to useLOAD_DEREFandSTORE_DEREF, which access the variable through the closure.
def outer_function():
enclosing_var = "Original"
def inner_function():
nonlocal enclosing_var # Compiler: treat 'enclosing_var' as a FREE variable
enclosing_var = "Modified" # Bytecode uses STORE_DEREF
inner_function()
print(enclosing_var) # Output: Modified
outer_function()
Best Practices and Common Pitfalls
- Avoid Shadowing: Be cautious of using names for local variables that are already defined in the global scope (e.g.,
list,dict,id). This makes the global name inaccessible and can lead to confusing errors. - Explicit is Better Than Implicit: Relying on the LEGB lookup is standard, but for clarity—especially in large functions—it can be beneficial to explicitly pass variables as arguments or use class attributes instead of mutating global state.
- Understand the Error: An
UnboundLocalErroris a specific subclass ofNameError. It doesn’t mean the variable doesn’t exist; it means the compiler has categorized it as local but no value has been bound to it yet at the point of use. This is a direct result of the compile-time decision. - Closures and Late Binding: A common pitfall with enclosing scopes occurs when creating lambdas or functions in a loop. The variable from the enclosing scope is looked up at runtime, not when the function is defined.
# Pitfall: Late Binding in Closures
functions = []
for i in range(3):
functions.append(lambda: print(f"i is: {i}"))
for f in functions:
f() # Output: i is: 2, i is: 2, i is: 2
# All lambdas refer to the same final value of 'i'
# Solution: Capture the value at definition time using a default argument
functions_correct = []
for i in range(3):
functions_correct.append(lambda i=i: print(f"i is: {i}")) # 'i' is a local parameter now
for f in functions_correct:
f() # Output: i is: 0, i is: 1, i is: 2