47.4 Shared State: Value, Array, and Manager
When working with multiprocessing, processes do not share memory by default. Each process operates in its own memory space, which provides inherent isolation and prevents direct memory corruption between processes. However, many real-world problems require concurrent access to shared data. The multiprocessing module provides several high-level abstractions for this purpose: Value and Array for efficient ctypes-based sharing, and Manager for more flexible, albeit slower, object sharing. The choice between them represents a classic trade-off between performance and flexibility.
The Need for Synchronization
Before delving into the specific tools, it is crucial to understand why synchronizing access to shared state is non-negotiable. Without it, a race condition occurs, where the final outcome depends on the unpredictable sequence of operations across processes. Consider a simple counter increment: counter.value += 1. This operation is not atomic; it involves reading the value, adding one, and writing it back. If two processes execute this simultaneously, they might both read the same initial value (e.g., 5), increment it to 6, and write back 6, resulting in a lost update. The multiprocessing shared objects are process-safe, meaning the individual read or write operations are atomic, but sequences of operations are not. Therefore, you must use synchronization primitives like Lock to protect critical sections.
Using Value and Array
The Value and Array classes are the most efficient way to share data because they allocate ctypes objects directly from shared memory. This memory is mapped into the address space of all processes that need access to it, allowing for very fast reads and writes.
A Value is a wrapper for a single shared scalar value, while an Array is a wrapper for a shared one-dimensional array. When creating them, you must specify the ctypes data type. A common pitfall is using the wrong type specifier, which can lead to data corruption or runtime errors.
from multiprocessing import Process, Value, Array, Lock
import ctypes
def increment_counter(lock, counter):
for _ in range(100000):
with lock:
counter.value += 1
def square_elements(lock, arr):
with lock: # Lock ensures entire operation is atomic
for i in range(len(arr)):
arr[i] = arr[i] * arr[i]
if __name__ == '__main__':
counter = Value(ctypes.c_int, 0) # 'i' is also a valid typecode
shared_array = Array(ctypes.c_double, [1.0, 2.0, 3.0, 4.0]) # 'd' for double
lock = Lock()
procs = [
Process(target=increment_counter, args=(lock, counter)),
Process(target=increment_counter, args=(lock, counter))
]
for p in procs:
p.start()
for p in procs:
p.join()
print(f"Final counter value: {counter.value}") # Should be 200000
# Demonstrate array usage
square_proc = Process(target=square_elements, args=(lock, shared_array))
square_proc.start()
square_proc.join()
print(f"Squared array: {list(shared_array)}") # Output: [1.0, 4.0, 9.0, 16.0]
In this example, the Lock is essential to ensure that the entire increment operation is performed without interruption. Without it, the final counter value would likely be less than 200,000 due to race conditions.
Using a Manager for Complex Objects
While Value and Array are fast, they are limited to simple ctypes data structures. The Manager provides a way to create shared Python objects like lists, dictionaries, queues, and even custom classes. A Manager controls a server process that holds the actual objects. All other processes access these objects via proxies, which communicate with the server process using RPC (Remote Procedure Calls). This indirection provides tremendous flexibility but comes with a significant performance cost compared to shared memory.
from multiprocessing import Process, Manager
def worker_function(shared_list, index):
"""Appends a value to a shared list."""
shared_list.append(index * 10)
# The append operation is atomic for a Manager list, so a lock isn't strictly needed here.
# However, for multi-step operations (e.g., `if value in list: list.append(...)`), a lock is required.
if __name__ == '__main__':
with Manager() as manager:
shared_list = manager.list() # Create a shared list
processes = []
for i in range(5):
p = Process(target=worker_function, args=(shared_list, i))
processes.append(p)
p.start()
for p in processes:
p.join()
print(f"Final shared list: {list(shared_list)}")
# Output order is non-deterministic, e.g., [0, 10, 20, 30, 40]
Best Practices and Pitfalls
- Always Synchronize: Assume nothing is thread-safe except the atomicity of single operations on
Value/Array. UseLockfor any operation involving a read-modify-write cycle or any multi-step logic on aManagerobject. - Mind the Performance Cost: Use
ValueandArrayfor high-frequency updates to numerical data. Resort toManageronly when you need to share complex structures like nested lists or dictionaries. - Manager’s Flexibility Trap: It’s tempting to use a
Managerfor everything due to its simplicity. However, the serialization and IPC overhead can easily become a bottleneck. Profile your code if performance is critical. - Namespace and Custom Objects: The
Managercan also create a sharedNamespaceor allow you to register custom classes, enabling the sharing of more sophisticated state. This involves even more overhead and complexity. - Memory Leaks: For
Managerobjects, ensure the server process is properly shut down. Using thewithstatement (as shown above) is the best practice to guarantee this. - Type Safety: Be meticulous with ctypes type codes for
ValueandArray. Assigning afloatto anArray(ctypes.c_int, ...)will truncate the value and may cause unexpected behavior.