30.6 collections.abc: Built-in ABCs for Containers and Iterables
The collections.abc module provides a rich set of abstract base classes (ABCs) that define the core protocols for Python’s container and iterable types. These ABCs serve as both a formal specification and a mechanism for structural subtyping (duck typing), allowing you to check if an object conforms to a specific interface without requiring explicit inheritance. This is a cornerstone of Python’s philosophy, favoring protocols over rigid type hierarchies.
The Core ABCs and Their Hierarchies
The ABCs in collections.abc form a sophisticated inheritance hierarchy that mirrors the relationships between different container concepts. At the very top is the Container class, which requires the __contains__ method (in operator support). From there, the hierarchy branches into three main lines:
- Iterable: Requires
__iter__, the foundation for all objects that can be looped over. - Sized: Requires
__len__, for objects that have a meaningful length. - Reversible: Requires
__reversed__, for iterables that can be traversed backwards.
The Iterable branch is the most extensive. Collection is a crucial ABC that combines Sized, Iterable, and Container, representing the most basic complete collection. From Collection, we get:
- Sequence: Requires
__getitem__and__len__, representing immutable, ordered containers (e.g.,list,tuple,str). - MutableSequence: Adds
__setitem__,__delitem__, andinserttoSequence(e.g.,list). - Set: Requires the methods needed for mathematical set operations (e.g.,
__contains__,__iter__,__le__). - MutableSet: Adds methods like
addanddiscardtoSet. - Mapping: Requires
__getitem__,__iter__, and__len__(e.g.,dict). - MutableMapping: Adds
__setitem__and__delitem__toMapping.
This hierarchy allows for precise type checking. Instead of checking isinstance(obj, list), which is overly specific, you can check isinstance(obj, abc.MutableSequence), which will be True for any list-like object that implements the full mutable sequence protocol.
from collections.abc import MutableSequence
class MyCustomList:
def __init__(self, *args):
self._data = list(args)
def __getitem__(self, index):
return self._data[index]
def __setitem__(self, index, value):
self._data[index] = value
def __delitem__(self, index):
del self._data[index]
def __len__(self):
return len(self._data)
def insert(self, index, value):
self._data.insert(index, value)
def __repr__(self):
return repr(self._data)
custom_list = MyCustomList(1, 2, 3)
# This returns True because MyCustomList implements the full protocol.
print(isinstance(custom_list, MutableSequence)) # Output: True
Why Use collections.abc for Type Checking
Using ABCs for type checking is fundamentally more robust and Pythonic than checking for concrete types. It decouples your code from specific implementations. A function that accepts any MutableSequence will work with a list, your custom MyCustomList, or any other object that fulfills the contract. This promotes code reusability and adheres to the interface segregation principle. It moves the question from “Are you a list?” to “Do you behave like a mutable sequence?”.
Using ABCs as Base Classes
While structural subtyping is powerful, you can also use these ABCs as formal base classes for your own custom containers. The major advantage is that the ABC provides a blueprint of required methods and can often provide free mixin implementations for other related methods.
from collections.abc import Set
class CaseInsensitiveSet(Set):
"""A set that treats 'Hello' and 'hello' as the same element."""
def __init__(self, iterable):
self._data = {str(item).lower() for item in iterable}
def __contains__(self, item):
return str(item).lower() in self._data
def __iter__(self):
return iter(self._data)
def __len__(self):
return len(self._data)
# The Set ABC automatically provides methods like .isdisjoint()
# because it can implement them using __contains__, __iter__, and __len__.
cis = CaseInsensitiveSet(['Hello', 'world'])
print('hello' in cis) # Output: True
print('WORLD' in cis) # Output: True
print(cis.isdisjoint(['HELLO'])) # Output: False (they are not disjoint)
# The following line would raise TypeError: 'CaseInsensitiveSet' object is not mutable
# cis.add('new')
In this example, by inheriting from Set, we only had to implement __contains__, __iter__, and __len__. The Set ABC provided concrete implementations of isdisjoint, __le__ (<=), __and__ (&), and all other set operations based on these three core methods. Note that our set is immutable; attempting to use add() fails because we didn’t inherit from MutableSet or implement it ourselves.
Common Pitfalls and Best Practices
Precise ABC Selection: Choose the most specific ABC that fits your need. If your function only needs to iterate over an object, check for
Iterable, notSequence. If it only needs the length, check forSized. This makes your function maximally flexible.The Danger of
Iterable: AnIterablecan only be consumed once (like a generator). If your function needs to iterate over the data multiple times, it should often convert it to aSequenceor aCollectionfirst, or check forSizedto ensure it’s a finite container before starting.Immutability Contracts: Be aware of the mutability implied by the ABC. If your function should not modify its input, accept a
SequenceorMapping, not aMutableSequenceorMutableMapping. This communicates intent to the caller and prevents accidental modifications.Registering Built-in Types: The built-in types (
list,tuple,dict,set, etc.) are already implicitly registered with their corresponding ABCs. You do not need to, and should not, register them yourself.Performance: While
isinstancechecks with ABCs are highly optimized, they are still a runtime operation. In performance-critical inner loops, consider caching the result of the check or the object’s methods.