53.4 shelve: Persistent Dictionary Backed by pickle
The shelve module provides a persistent, dictionary-like object that stores its data in a database file, typically using the dbm module for the underlying storage and the pickle module to serialize the Python values. This combination makes it an exceptionally convenient tool for scenarios requiring a simple, persistent key-value store where the keys are strings and the values can be any object that pickle can handle. It effectively bridges the gap between in-memory dictionaries and full-fledged databases, offering a familiar dictionary API for persistence.
Opening a Shelf and Understanding its Nature
A shelf is opened using the shelve.open() function, which returns a shelf object. The writeback parameter is crucial for understanding the module’s behavior. When writeback=False (the default), the shelf operates like a traditional database: values are pickled and written to disk only when they are explicitly assigned to a key. If you mutate a mutable value (like a list or dictionary) that you’ve retrieved from the shelf, the change is not automatically persisted. The shelf has no way of knowing the retrieved object was modified.
import shelve
# Open a shelf. The flag 'c' means open for reading and writing, create if needed.
# Using the default writeback=False.
with shelve.open('mydata.db', flag='c') as db:
db['key1'] = ['initial', 'list'] # The list is pickled and written to disk.
# Later, in a new block or session
with shelve.open('mydata.db', flag='c') as db:
retrieved_list = db['key1'] # The list is unpickled into a new object in memory.
retrieved_list.append('new_item') # This modifies the in-memory copy.
# The shelf on disk is UNCHANGED at this point.
print(db['key1']) # Output: ['initial', 'list'] -- the change is lost!
# To persist the change, you must assign the modified object back to the key.
db['key1'] = retrieved_list # Now it's re-pickled and written.
print(db['key1']) # Output: ['initial', 'list', 'new_item']
The writeback Parameter and its Trade-offs
Setting writeback=True changes this behavior. The shelf will cache all objects it reads in an in-memory dictionary. When the shelf is closed or synchronized, every object in the cache is pickled and written back to disk, regardless of whether it was actually changed. This makes the shelf behave much more like a regular mutable dictionary, as changes to mutable objects are automatically persisted.
with shelve.open('mydata_wb.db', flag='c', writeback=True) as db:
db['key1'] = ['initial', 'list']
retrieved_list = db['key1']
retrieved_list.append('new_item') # Modification happens.
# No explicit reassignment is needed...
# ...but the change isn't on disk until sync() or close() is called.
db.sync() # Explicitly write all cached entries back to disk.
# The change is now persisted.
However, writeback=True comes with significant costs. It can consume large amounts of memory if the shelf contains many or large objects, as everything accessed is stored in the cache. It can also be inefficient on close, as every cached object must be serialized and written, even if only one was modified. This flag is best used for small shelves or when convenience drastically outweighs performance concerns.
Best Practices and Common Pitfalls
- Always Use a Context Manager (
withstatement): This ensures the shelf is properly closed, which is critical for thewritebackcache to be flushed and for the underlying database file to be synchronized. Failing to close a shelf can lead to data corruption. - Treat Shelf Objects as Context-Managed Resources: Never leave a shelf open for extended periods. Open it, perform your operations, and close it. This is a robust practice.
- Prefer String Keys: While some
dbmbackends support other key types, theshelvedocumentation specifies that keys should be strings. Using other types can lead to errors depending on the underlying database implementation. - Be Mindful of Mutability: If not using
writeback=True, remember that you must assign modified objects back to the key. A common pitfall is forgetting this step and losing data. - Understand Performance Characteristics:
shelveis not a high-performance database. It’s ideal for small to medium-sized datasets where the convenience of the dictionary API is paramount. For large-scale or concurrent applications, a proper database like SQLite or a key-value store likeredisis more appropriate. - Concurrency Limitations: The standard
dbmbackends are not thread-safe for writing, nor are they safe for concurrent access from multiple processes. If you need concurrent writes, you must implement your own locking mechanism or choose a different storage solution. - Platform Dependency: The specific
dbmformat chosen (e.g.,dbm.gnu,dbm.ndbm) can be platform-dependent. A shelf file created on one operating system might not be readable on another. For data portability, this is an important consideration.