The multiprocessing.shared_memory module, introduced in Python 3.8, provides a high-level mechanism for creating and managing blocks of shared memory that can be accessed directly by multiple Python processes. This is fundamentally different from the queue- or pipe-based communication offered by other parts of the multiprocessing module. Instead of serializing, sending, and deserializing data (a process known as “pickling”), shared memory allows processes to read and write to the same region of physical memory. This enables true zero-copy data sharing, which can dramatically reduce overhead and increase performance for large datasets, such as numerical arrays, image buffers, or large matrices used in scientific computing and machine learning.

The core principle behind this module is the allocation of a named block of shared memory that exists outside the confines of any single Python process’s heap. This memory is managed by the underlying operating system and persists until it is explicitly unlinked (destroyed). Any process with knowledge of the shared memory block’s name can attach to it, map it into its own address space, and access its contents directly. This approach avoids the GIL (Global Interpreter Lock) for operations on this memory, as processes are interacting with raw bytes outside the Python interpreter’s control.

Creating and Using a SharedMemory Object

The primary class for this functionality is SharedMemory. To create a new block of shared memory, you instantiate this class with a desired size and an optional name. If no name is provided, one is generated automatically by the OS.

from multiprocessing import shared_memory, Process
import numpy as np

def worker(shm_name, shape, dtype):
    # Attach to the existing shared memory block
    existing_shm = shared_memory.SharedMemory(name=shm_name)
    # Create a numpy array that uses the shared memory buffer
    shared_array = np.ndarray(shape, dtype=dtype, buffer=existing_shm.buf)
    # Perform an operation on the shared data
    shared_array *= 2
    # Important: Close access to the shared memory from this process
    existing_shm.close()

if __name__ == '__main__':
    # Create a new shared memory block large enough for a 100x100 float array
    size = 100 * 100 * np.dtype(float).itemsize
    shm = shared_memory.SharedMemory(create=True, size=size)
    # Create a numpy array backed by the shared memory
    arr = np.ndarray((100, 100), dtype=float, buffer=shm.buf)
    arr[:] = 1.0  # Initialize the array with ones

    # Start a process that will operate on the shared data
    p = Process(target=worker, args=(shm.name, arr.shape, arr.dtype))
    p.start()
    p.join()

    # Verify the change made by the worker process
    print(arr[0, 0])  # Output: 2.0

    # Cleanup: close and unlink the shared memory block
    shm.close()
    shm.unlink()

A common and critical point of confusion is the difference between the close() and unlink() methods. Understanding their roles is essential for preventing resource leaks and crashes.

  • close(): This method is called by each client process (including the creator) when it is done accessing the shared memory block. It severs the process’s own mapping to the shared memory, freeing process-specific resources. However, the underlying shared memory block continues to exist and can be accessed by other processes that are still attached to it. Every process that attaches to the block must call close().

  • unlink(): This method is typically called once by the original creator process (or a process designated for cleanup) when the shared memory block is no longer needed by any process. It requests that the operating system destroy the shared memory block itself and free the physical memory. Once unlinked, the name becomes invalid, and no new processes can attach to it, though processes that are already attached can continue to use it until they call close(). Failure to call unlink() will result in the shared memory block persisting in the system until the next reboot.

Best Practices and Common Pitfalls

  1. Synchronization is Mandatory: Shared memory provides no inherent synchronization. If one process is writing to a location while another is reading from it, a race condition occurs, leading to corrupted or inconsistent data. You must use synchronization primitives like multiprocessing.Lock to coordinate access. For array operations, consider using atomic operations or designing your algorithm to work on non-overlapping segments.

  2. Robust Cleanup with try/finally: Always structure your code to ensure proper cleanup, even if an error occurs. The creator process should use a try/finally block to guarantee that unlink() is called.

    shm = shared_memory.SharedMemory(create=True, size=100)
    try:
        # ... use the shared memory ...
        pass
    finally:
        shm.close()
        shm.unlink()
    
  3. Data Representation: The shared memory block is a raw byte buffer. It is the programmer’s responsibility to impose structure on it, such as by using a numpy.ndarray or array.array. All processes must agree on the data type, shape, and byte order (endianness).

  4. Platform Compatibility: While the API is cross-platform, the implementation relies on OS-specific features (POSIX shm on Linux/macOS, named memory-mapped files on Windows). The automatically generated names will differ in format between platforms.

Sharing Existing Arrays with ShareableList

For simpler data structures like lists of integers or floats, the ShareableList class offers a more convenient abstraction. It handles the serialization and deserialization of a limited set of Python types (ints, floats, bools, strings, None, and other ShareableList instances) into a shared memory block, providing a list-like interface.

from multiprocessing.shared_memory import ShareableList
from multiprocessing import Process

def modify_list(sl):
    sl[0] += 10  # Modification happens in shared memory

if __name__ == '__main__':
    sl = ShareableList([1, 2.5, 'hello', None])
    p = Process(target=modify_list, args=(sl,))
    p.start()
    p.join()

    print(sl)  # Output: ShareableList([11, 2.5, 'hello', None])
    
    # Cleanup
    sl.shm.close()
    sl.shm.unlink()