When distributing Python packages, you often need to include more than just .py files. Two common requirements are non-code data files (configuration templates, images, schemas, etc.) and binary extensions (compiled C/C++/Rust code). The packaging ecosystem handles these fundamentally different assets in distinct ways, and understanding these mechanisms is crucial for successful distribution.

Including Package Data Files

The setuptools library provides the primary mechanism for including non-code files. Historically, the package_data argument in setup.py was used, but modern practice favors declaring these inclusions in a pyproject.toml file using the [tool.setuptools.package-data] table. This method is more declarative and separates configuration from code.

The syntax is straightforward: the key is the package name, and the value is a list of glob patterns relative to that package’s directory. For example, to include all .json files and everything in a templates/ directory within a package named mypackage, you would configure it as follows:

# pyproject.toml
[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"

[tool.setuptools.package-data]
mypackage = ["*.json", "templates/*"]

A critical pitfall is that these patterns only match files that are under version control if you use a modern setuptools version with a pyproject.toml file. This is a safety feature to prevent accidentally packaging temporary or local files. Therefore, you must git add any data files you intend to include. To access these files at runtime, you should use the importlib.resources API, which provides a filesystem-agnostic way to read resources from within a package.

# Example of reading a data file at runtime
from importlib import resources

try:
    # For Python 3.9+
    with resources.files("mypackage").joinpath("config.json").open('r') as f:
        config_data = f.read()
except AttributeError:
    # Fallback for older Python versions
    with resources.open_text("mypackage", "config.json") as f:
        config_data = f.read()

Including Binary Extensions

Binary extensions are shared libraries (e.g., .so files on Linux, .dylib on macOS, .pyd on Windows) that are compiled from languages like C, C++, or Rust. They are integrated into Python packages using the setuptools.Extension class. This class describes the extension module to the build system, specifying source files, include directories, compiler flags, and linked libraries.

The ext_modules parameter in your setup() function (or in pyproject.toml under [tool.setuptools]) is a list of these Extension instances. The build process will then compile the source code into a platform-specific binary wheel.

# setup.py (alternative to pyproject.toml configuration)
from setuptools import setup, Extension

module = Extension(
    'mypackage.mymodule',  # Full module name for the extension
    sources=['src/mymodule.c', 'src/helper.c'],  # List of source files
    include_dirs=['/usr/local/include'],  # Directories for header files
    libraries=['ssl', 'crypto'],  # System libraries to link against
    define_macros=[('DEBUG', '1')]  # Preprocessor macros
)

setup(
    name="mypackage",
    version="1.0.0",
    ext_modules=[module]
)

The most significant challenge with binary extensions is handling platform compatibility. A source distribution (sdist) contains the C source code and must be compiled on the target machine, which requires the user to have the correct compiler toolchain installed. This is often a major hurdle for end-users. The solution is to build platform wheels (e.g., manylinux wheels for Linux, macOS wheels for Apple platforms, and Windows wheels). These are pre-compiled binary distributions that can be installed directly by pip without any compilation. Tools like cibuildwheel automate the process of building these wheels across multiple platforms in CI/CD pipelines.

Best Practices and Common Pitfalls

  1. Always Build Wheels: Never rely on users successfully building from an sdist. Always build and upload pre-compiled wheels for every platform you wish to support. This provides a seamless installation experience.
  2. Use MANIFEST.in for SDist-Only Files: The package_data configuration only affects wheel builds. If you need to include files (like documentation or license files) only in your source distribution, you must use a MANIFEST.in file.
  3. Test on Clean Systems: The build environment on your development machine is likely contaminated with libraries and headers. Always test building your package in a clean, minimal environment (like a Docker container) to uncover hidden dependencies that would cause the build to fail for your users.
  4. Leverage pyproject.toml: For new projects, prefer declaring your configuration in pyproject.toml under [tool.setuptools] rather than in setup.py. This is the modern standard defined in PEP 621 and keeps your project metadata static and tool-agnostic.
  5. Resource Access Path: A common mistake is using __file__ and then manipulating paths to find data files. This will break if your package is installed in a zip file (e.g., within a .egg). Always use importlib.resources as it is designed to work regardless of how the package is stored.