Shared-Memory | mikePietsch.com

23. WASM Threads and SIMD

47.8 Combining Multiprocessing with asyncio

Combining the process-based parallelism of the multiprocessing module with the cooperative concurrency of asyncio is a powerful technique for building highly scalable applications in Python. This hybrid approach allows you to bypass the Global Interpreter Lock (GIL) for CPU-intensive tasks while simultaneously managing thousands of I/O-bound operations. The core challenge lies in orchestrating communication between the synchronous, process-isolated world of multiprocessing and the asynchronous, single-threaded event loop of asyncio. The Event Loop and Process Isolation The fundamental reason these two worlds don’t seamlessly integrate is process isolation. An asyncio event loop exists within a single process and thread. When you launch a separate process using multiprocessing.Process, it receives a complete copy of the parent’s memory space and, crucially, its own Python interpreter and a separate event loop. The child process knows nothing about the parent’s event loop, and vice versa. Therefore, you cannot directly await a function running in another process; the event loops are not connected. The solution is to use the multiprocessing module’s communication primitives (like Queue, Pipe, or shared memory) to send messages and synchronize between the asynchronous parent and its synchronous worker processes, treating the workers as independent entities.

47.7 Start Methods: spawn, fork, forkserver

The choice of start method dictates how a new process is created in Python’s multiprocessing module. This is not a trivial implementation detail; it fundamentally alters how the new process initializes, what resources it inherits, and, consequently, what pitfalls you might encounter. The three methods—'fork', 'spawn', and 'forkserver'—are available on Unix-like systems (Linux, macOS), while Windows is limited to 'spawn' only. The default and available methods can be checked using multiprocessing.get_all_start_methods() and set using multiprocessing.set_start_method() at the beginning of your program.

47.6 Pickleability Requirements for Multiprocessing

At the heart of Python’s multiprocessing module lies a fundamental constraint: data must be passed between the main process and its worker children. Since these processes do not share a global interpreter lock (GIL) or a common memory space (by default), this inter-process communication (IPC) is achieved through serialization and deserialization. The primary serialization protocol used in Python is called “pickling,” implemented by the pickle module. Therefore, any object you pass as an argument to a Process or a Pool method like apply, map, or their asynchronous counterparts, or any object you return from the target function, must be pickleable.

47.5 multiprocessing.shared_memory: Zero-Copy Shared Buffers

The multiprocessing.shared_memory module, introduced in Python 3.8, provides a high-level mechanism for creating and managing blocks of shared memory that can be accessed directly by multiple Python processes. This is fundamentally different from the queue- or pipe-based communication offered by other parts of the multiprocessing module. Instead of serializing, sending, and deserializing data (a process known as “pickling”), shared memory allows processes to read and write to the same region of physical memory. This enables true zero-copy data sharing, which can dramatically reduce overhead and increase performance for large datasets, such as numerical arrays, image buffers, or large matrices used in scientific computing and machine learning.

47.4 Shared State: Value, Array, and Manager

When working with multiprocessing, processes do not share memory by default. Each process operates in its own memory space, which provides inherent isolation and prevents direct memory corruption between processes. However, many real-world problems require concurrent access to shared data. The multiprocessing module provides several high-level abstractions for this purpose: Value and Array for efficient ctypes-based sharing, and Manager for more flexible, albeit slower, object sharing. The choice between them represents a classic trade-off between performance and flexibility.

47.3 Pool: Map, Starmap, and Apply Async

The multiprocessing.Pool class provides a high-level interface for distributing work across multiple processes, abstracting away much of the complexity of manual process management. It creates a pool of worker processes, typically one per CPU core, which remain idle, ready to consume tasks from a job queue. This model is exceptionally efficient for embarrassingly parallel problems—where tasks are independent and require no communication between processes—as it avoids the overhead of repeatedly creating and destroying processes.

47.2 The multiprocessing Module: Process, Queue, Pipe

The multiprocessing module is the cornerstone of process-based parallelism in Python. It sidesteps the Global Interpreter Lock (GIL) by spawning new operating system processes, each with its own private memory space and a dedicated Python interpreter. This allows for true parallel execution on multi-core systems. The Process class provides the fundamental building block for creating and managing these processes, while Queue and Pipe are the primary mechanisms for safe communication and data exchange between them.

47.1 Process vs Thread: When to Use Each

The fundamental distinction between a process and a thread lies in their relationship to system resources. A process is an independent instance of a running program, complete with its own private memory space, file handles, and system resources. It is an isolated unit of execution, managed by the operating system’s scheduler. In contrast, a thread is a lightweight unit of execution that exists within a process. All threads within a single process share the same memory space and resources, operating as cooperative, concurrent paths of execution. This core architectural difference dictates their optimal use cases, performance characteristics, and the complexity involved in their implementation.