47.6 Pickleability Requirements for Multiprocessing

At the heart of Python’s multiprocessing module lies a fundamental constraint: data must be passed between the main process and its worker children. Since these processes do not share a global interpreter lock (GIL) or a common memory space (by default), this inter-process communication (IPC) is achieved through serialization and deserialization. The primary serialization protocol used in Python is called “pickling,” implemented by the pickle module. Therefore, any object you pass as an argument to a Process or a Pool method like apply, map, or their asynchronous counterparts, or any object you return from the target function, must be pickleable.

An object is considered pickleable if the pickle module can convert it into a byte stream (serialization) and then reconstruct the original object from that byte stream (deserialization) in a different process. This requirement is not arbitrary; it is a direct consequence of the process-based parallelism model, which prioritizes isolation and stability over shared state.

What Makes an Object Pickleable?

The pickle protocol can serialize a wide range of Python objects by default. This includes:

All basic data types: integers, floats, strings, booleans, and None.
Collections of pickleable objects: lists, tuples, dictionaries, and sets.
Functions and classes defined at the top level of a module (not nested inside another function or class).
Instances of classes, provided the class itself is pickleable and its __dict__ attribute is pickleable, or it defines custom __getstate__() and __setstate__() methods.

The following code example demonstrates passing various inherently pickleable objects to a worker process without issue.

import multiprocessing

def worker_func(num, text, data_list, data_dict):
    """A function that operates on various pickleable data types."""
    result = f"Processed: {num}, {text}, {data_list[0]}, {data_dict['key']}"
    return result

if __name__ == '__main__':
    with multiprocessing.Pool(processes=1) as pool:
        # All arguments here are inherently pickleable
        result = pool.apply(worker_func, (
            42,
            'hello',
            [1, 2, 3],
            {'key': 'value'}
        ))
        print(result)  # Output: Processed: 42, hello, 1, value

Common Non-Pickleable Objects and Pitfalls

The most frequent causes of PicklingError exceptions stem from objects that contain references to non-pickleable resources or entities. The most common culprits are:

Lambda Functions and Nested Functions: These functions are tied to the namespace of their enclosing scope, which cannot be reliably reconstructed in a new process.
Threads, Sockets, and File Handles: These objects represent live operating system resources or complex internal state that cannot be meaningfully serialized into a byte stream. A file handle from Process A is meaningless in Process B.
Instances of Classes Defined Within __main__: This is a very common pitfall, especially when working in an interactive environment like Jupyter Notebook or running a script directly. If a class is defined inside the if __name__ == '__main__': block, it is not accessible to the worker processes, which are fresh interpreters that only import the main module.

import multiprocessing

# This will FAIL with a PicklingError
if __name__ == '__main__':
    class MyCustomClass:
        def __init__(self, value):
            self.value = value

    def worker(obj):
        return obj.value

    obj = MyCustomClass(10)
    with multiprocessing.Pool() as pool:
        # This will fail because MyCustomClass is defined inside __main__
        result = pool.apply(worker, (obj,))
        print(result)

Strategies for Handling Non-Pickleable Objects

You cannot magically make a socket pickleable, but you can design your code to work around this limitation.

Restructure Code: Move class and function definitions to the top level of a module so they can be imported by worker processes. This is the simplest and most robust solution.
Use multiprocessing’s Specific Tools: For resources that can be inherited, like file descriptors, you can sometimes rely on Unix fork behavior, but this is not portable or recommended. For more complex shared state, use the module’s built-in tools like multiprocessing.Manager to create proxy objects or multiprocessing.Array/multiprocessing.Value for shared memory.
Use __getstate__ and __setstate__: For custom classes, you can define how they are pickled. If your class has an unpickleable member (e.g., a logger), you can exclude it from serialization and reinitialize it upon deserialization.

import multiprocessing

# Define the class at the TOP LEVEL so it is importable
class MyCustomClass:
    def __init__(self, value):
        self.value = value
        # A non-pickleable attribute (simulated)
        self.thread = None

    def __getstate__(self):
        # This method is called when the object is pickled.
        # Create a copy of the object's __dict__, excluding the unpickleable thread.
        state = self.__dict__.copy()
        del state['thread']
        return state

    def __setstate__(self, state):
        # This method is called when the object is unpickled.
        # Restore the state and reinitialize the unpickleable attribute.
        self.__dict__.update(state)
        self.thread = None # Reinitialize the thread attribute

def worker(obj):
    return obj.value

if __name__ == '__main__':
    obj = MyCustomClass(10)
    with multiprocessing.Pool() as pool:
        result = pool.apply(worker, (obj,))
        print(result)  # Output: 10

Debugging Pickling Errors

When you encounter a PicklingError, the error message is often cryptic. The pickle module can tell you what it got (e.g., <class '__main__.MyCustomClass'>), but not why it failed. To debug, use the multiprocessing loglevel parameter or a tool like pickletools to analyze the object. Often, the problem is a reference to an object defined in an inaccessible scope. Isolating the problematic argument and attempting to pickle.dumps() it in the main process is an effective way to identify the exact cause of the failure before even involving multiple processes. Understanding pickleability is not just a technicality; it is essential for designing effective and error-free parallel applications in Python.