In Python, a package is a way of structuring a collection of related modules by organizing them into a directory hierarchy. This structure not only helps in managing large codebases but also prevents naming conflicts by providing a namespace. The traditional mechanism for defining a package relies on a special file named __init__.py.

The Role of init.py

The presence of an __init__.py file in a directory signals to the Python interpreter that the directory should be treated as a package. This file can be completely empty, but it is most commonly used to execute package initialization code and to define what is made available when a user imports the package itself. When a package is imported, the code within __init__.py is executed exactly once. This behavior is crucial because it allows for the setup of package-level state, such as initializing configuration or importing key submodules to flatten the package’s namespace for easier access.

Consider a package structure for a graphics library:

graphics/
    __init__.py
    primitives/
        __init__.py
        line.py
        circle.py
    formats/
        __init__.py
        png.py
        jpg.py

The top-level graphics/__init__.py might be used to import commonly used classes directly into the package’s root namespace, saving users from having to use deeper, more verbose imports.

# graphics/__init__.py
from .primitives.line import Line
from .primitives.circle import Circle
from .formats.png import PNGExporter

__version__ = "1.0.0"

# Package initialization code, e.g., setting up a logger
import logging
logging.getLogger(__name__).addHandler(logging.NullHandler())

With this setup, a user can directly import these items from the package, making the API more convenient:

from graphics import Line, Circle, PNGExporter

Without the __init__.py file performing these imports, the user would need to know the internal structure and use from graphics.primitives.line import Line.

Controlling imports with all

The __all__ variable is a list of strings that defines the public interface of a module or package. When a user imports a package using from package import *, the names listed in __all__ are the only ones that will be imported. This is a crucial tool for defining an explicit API and preventing internal implementation details from leaking into the user’s namespace. If __all__ is not defined, the star import will only import names that do not begin with an underscore.

# graphics/formats/__init__.py
from .png import PNGExporter
from .jpg import JPGExporter

__all__ = ['PNGExporter', 'JPGExporter']  # Only these will be imported by `from formats import *`

Namespace Packages: Packages Without init.py

Introduced in PEP 420, namespace packages are a mechanism for splitting a single package across multiple directories on the filesystem, or even across different locations in sys.path. This is incredibly useful for large frameworks where components might be developed and installed independently. A namespace package is created automatically by the import system when it encounters a directory that does not contain an __init__.py file, but whose name matches the requested package.

Imagine two separate projects contributing to a common corp.utils namespace package:

# Location 1: /path/to/project_a
corp/
    utils/
        math_helpers.py  # No __init__.py here

# Location 2: /path/to/project_b
corp/
    utils/
        string_helpers.py  # No __init__.py here

If both /path/to/project_a and /path/to/project_b are in your Python path, the import system will merge them into a single virtual package.

import sys
sys.path.extend(['/path/to/project_a', '/path/to/project_b'])

from corp.utils import math_helpers  # Found in project_a
from corp.utils import string_helpers # Found in project_b

The primary advantage is that two different distributions can install modules into the same package namespace without any risk of file conflict, as there is no single __init__.py file to overwrite. The main pitfall is that you cannot use relative imports within a namespace package component, as there is no single __init__.py file to define the package’s location authoritatively.

Best Practices and Common Pitfalls

  1. Be Intentional with __init__.py: Avoid writing complex logic or performing heavy computations in __init__.py, as it runs on every import. Use it for lightweight setup and defining the public API. For large, optional subpackages, consider importing them inside functions to defer the import cost until needed.

  2. Always Define __all__: This is a best practice for any public module or package. It serves as explicit documentation of your API and prevents accidental exposure of internal helper functions or classes during a star import.

  3. Understand the Distinction: Know when to use a regular package (with __init__.py) and when a namespace package is appropriate. Use regular packages for most projects where all components are shipped together. Reserve namespace packages for large, federated projects with independently installable components.

  4. Path Order Matters for Namespace Packages: The order of directories in sys.path determines the search order for modules within a namespace package. The first component found that contains the requested module name will be used. This can lead to subtle bugs if different paths contain modules with the same name.

  5. Pitfall: Circular Imports in __init__.py: If graphics/__init__.py imports from graphics.primitives.line, and graphics/primitives/__init__.py in turn imports something from the top-level graphics package, a circular import will occur. Structure your package imports carefully to avoid this.