53.7 Protocol Buffers with protobuf

Protocol Buffers (protobuf) is a language-neutral, platform-neutral, extensible mechanism for serializing structured data, developed by Google. It is significantly more efficient in both size and speed compared to XML or JSON and provides a robust system for defining data schemas (*.proto files) that serve as the single source of truth for the structure of your serialized data. This schema-driven approach enforces contracts between applications, ensuring data consistency and enabling backward and forward compatibility through explicitly defined rules.

53.6 msgpack, cbor2, and Other Compact Binary Formats

While Python’s pickle module is convenient for Python-specific serialization, its lack of interoperability with other languages and inherent security risks make it unsuitable for many applications. This is where compact, language-agnostic binary formats like MessagePack (msgpack) and Concise Binary Object Representation (CBOR) excel. These formats offer a compelling blend of performance, small payload size, and cross-platform compatibility, making them ideal for network communication, data storage, and inter-process communication where Python is not the sole participant.

53.5 struct: Packing and Unpacking Binary Data

The struct module is Python’s primary tool for converting between Python values and C-style data structures represented as Python bytes objects. This is essential for interacting with binary files, network protocols, device drivers, or any system that uses a tightly packed binary data layout. Unlike pickle, which is Python-specific, struct produces and consumes standardized binary formats, enabling interoperability with programs written in other languages. Its core functions are pack() and unpack(). The pack(fmt, v1, v2, ...) function takes a format string and a series of values, returning a bytes object containing the packed values. The unpack(fmt, buffer) function takes a format string and a buffer (e.g., a bytes object), and returns a tuple of unpacked values.

53.4 shelve: Persistent Dictionary Backed by pickle

The shelve module provides a persistent, dictionary-like object that stores its data in a database file, typically using the dbm module for the underlying storage and the pickle module to serialize the Python values. This combination makes it an exceptionally convenient tool for scenarios requiring a simple, persistent key-value store where the keys are strings and the values can be any object that pickle can handle. It effectively bridges the gap between in-memory dictionaries and full-fledged databases, offering a familiar dictionary API for persistence.

53.3 Customizing Pickling: __getstate__ and __setstate__

While Python’s pickle module excels at serializing most objects automatically by storing their __dict__ attribute, this default behavior is insufficient for complex objects. Some objects may contain data that shouldn’t be persisted (like open file handles or database connections), reference other non-serializable objects, or have a state that is computationally expensive to reconstruct. For these scenarios, Python provides the __getstate__() and __setstate__() methods, offering a powerful mechanism to take complete control over the pickling and unpickling process.

53.2 Pickle Protocols and Security Warnings

The pickle module is a powerful tool for serializing and deserializing Python object structures. However, its power comes with significant responsibility, primarily due to its inherent security implications. Understanding its protocols and the associated warnings is not optional; it is a critical part of using the module safely and effectively. The Evolution of Pickle Protocols The pickle protocol defines a set of rules and conventions for how Python objects are converted into a byte stream. This protocol has evolved over time, with each new version offering improvements. The protocol version used is specified when an object is pickled, and Python’s unpickler can understand all previous protocols.

53.1 pickle: Serializing Arbitrary Python Objects

The pickle module is Python’s primary mechanism for object serialization and deserialization. It transforms complex Python object hierarchies into a byte stream (serialization, via pickling) and reconstructs those objects from the byte stream (deserialization, via unpickling). This process is fundamental for tasks like saving program state to disk, distributing work across processes, or caching computations. Unlike text-based formats like JSON, pickle can handle a vast range of Python types—including functions, classes, and instances—by storing not just the data, but the instructions needed to rebuild the object.

— joke —

...