71.8 How import Works Internally
Alright, let’s pull back the curtain on one of the most common yet surprisingly complex operations in Python: the import statement. You type import numpy as np and magic happens. But it’s not magic—it’s a meticulously engineered process, and understanding it is the key to debugging a whole class of frustrating problems. It’s a multi-stage journey from a name in a .py file to a live module object in your interpreter’s memory. Let’s trace the path.
The Import Button: It’s More Than a Button
When you issue an import request, CPython doesn’t just start reading files willy-nilly. It follows a strict protocol, defined in PEP 302, which introduces the concepts of finders and loaders. Think of it as a two-part process: first, we need a finder to locate the module’s source (Is it a file? A C library? A zip archive?), and then we need a loader to actually create the module object from that source.
The master list of places to look for these finders is stored in sys.meta_path. This is the big one. It’s a list of meta path finder objects that the interpreter calls, in order, every single time you import anything. Let’s take a peek at what’s in there by default.
import sys
print([finder.__class__.__name__ for finder in sys.meta_path])
On a standard CPython build, you’ll probably see something like ['BuiltinImporter', 'FrozenImporter', 'PathFinder']. These are the gatekeepers. BuiltinImporter handles built-in modules like sys or itertools. FrozenImporter deals with “frozen” modules (a niche mostly for embedded Python). The real workhorse for your own code is PathFinder.
The File System Detective: sys.path and PathFinder
So how does PathFinder know where to look? It consults another famous list: sys.path. This is a list of directory names and (sometimes) zip file paths that form the universe of places where your modules might live. The first entry is often an empty string '', which represents the current working directory—a common source of “why can’t I import my own module?!” headaches when you’re running code from the wrong place.
PathFinder’s job is to take the module name you requested (e.g., mypackage.mymodule) and, by walking through each directory in sys.path, try to find something that matches it. Its search is governed by a set of rules encoded in path entry finders, which live in sys.path_hooks. This is how it knows what to do with a .py file versus a .pyc file versus a directory (which signifies a package).
The Cached Shortcut: .pyc Files
You’ve undoubtedly seen these __pycache__ directories littering your projects. They are not just there to annoy you; they are a critical performance optimization. Once PathFinder locates a .py file (say, mymodule.py), its associated loader doesn’t just slam the text file into the interpreter.
It first checks for a corresponding .pyc file in __pycache__/. This file is the compiled bytecode of your module. If the .pyc file exists and its timestamp is newer than the source .py file, the loader will deserialize the bytecode from the .pyc file directly. This is dramatically faster than re-compiling the source from scratch. If the .pyc is stale or missing, it compiles the source, creates the module object, and then writes out a new .pyc for next time. This is why your imports are faster the second time you run a program.
From Bytecode to Module: The Execution Step
Here’s the part that feels a bit like a party trick but is utterly brilliant. The loader doesn’t just define a bunch of functions and variables in the module. It creates a new, empty module object (types.ModuleType), and then it executes the entire module’s bytecode in the context of that module’s namespace.
Let that sink in. import mymodule is essentially equivalent to you doing this:
# A gross simplification, but the spirit is correct
mymodule = types.ModuleType('mymodule')
with open('mymodule.py', 'r') as f:
source_code = f.read()
bytecode = compile(source_code, 'mymodule.py', 'exec')
exec(bytecode, mymodule.__dict__) # This is the magic line
The exec() call runs all the code at the top level of the module: function and class definitions are created, and any actual code (like print statements, calculations, or database connections) is run immediately. This is why you have to be careful about putting “side effects” at the top level of a module. That print("Hello!") isn’t just defining a function; it’s doing the thing when it’s imported.
The Double-Edged Sword of Reloading
You might have heard of importlib.reload(). It’s a tempting fix for when you’ve changed a module and don’t want to restart your interpreter. Be warned: it’s a footgun of the highest order.
reload() does not reset the module’s namespace. It re-executes the code in the existing module’s namespace. This means if your first run defined x = 1, and you change the source to x = 2 and reload, x will now be 2. But if you delete the line x = 2 from the source and reload, x will still be 2 because the reload doesn’t undo previous definitions; it just re-runs the new code. This leads to bizarre, hard-to-debug state inconsistencies. My professional advice? Just restart your interpreter. It’s safer and saner. The designers gave you a way to do it because in theory it’s possible, but in practice, it’s often a terrible idea.