45.3 defaultdict: Dictionaries with Default Values
The collections.defaultdict is a specialized dictionary subclass that automatically provides default values for missing keys, eliminating the need for repetitive key-existence checks. This powerful tool streamlines code, reduces boilerplate, and prevents common KeyError exceptions, making it indispensable for tasks involving grouping, counting, and accumulating data.
Understanding the Default Factory
At the heart of a defaultdict is its “default factory,” a callable provided as the first argument during initialization. When you attempt to access a key that does not exist, the defaultdict does not raise a KeyError. Instead, it calls this default factory (with no arguments) to generate a new value, inserts that value into the dictionary for the requested key, and then returns it.
The default factory can be any callable that returns an object. Most commonly, it is a type constructor like list, int, str, or dict. You can also use lambda functions or custom functions for more complex default values.
from collections import defaultdict
# A defaultdict that defaults to an empty list
list_dict = defaultdict(list)
list_dict['fruits'].append('apple')
print(list_dict['fruits']) # Output: ['apple']
print(list_dict['vegetables']) # Output: [] (key created automatically)
# A defaultdict that defaults to zero
int_dict = defaultdict(int)
int_dict['count'] += 1
print(int_dict['count']) # Output: 1
print(int_dict['unknown']) # Output: 0
# Using a lambda for a custom default
custom_dict = defaultdict(lambda: 'N/A')
print(custom_dict['name']) # Output: 'N/A'
Common Use Cases and Examples
The primary strength of defaultdict lies in simplifying aggregation patterns. Consider the classic task of grouping a sequence of items by a common key.
Without defaultdict, the code is clunky:
data = [('a', 1), ('b', 2), ('a', 3), ('c', 4)]
grouped = {}
for key, value in data:
if key not in grouped:
grouped[key] = [] # Explicit check for key existence
grouped[key].append(value)
print(grouped) # Output: {'a': [1, 3], 'b': [2], 'c': [4]}
With defaultdict, the logic becomes elegant and focused:
from collections import defaultdict
data = [('a', 1), ('b', 2), ('a', 3), ('c', 4)]
grouped = defaultdict(list) # The default factory is list()
for key, value in data:
grouped[key].append(value) # No if-check needed
print(dict(grouped)) # Output: {'a': [1, 3], 'b': [2], 'c': [4]}
Similarly, it excels at counting:
word_counts = defaultdict(int)
for word in ['apple', 'banana', 'apple', 'cherry']:
word_counts[word] += 1 # First access for 'banana' creates key with value 0, then adds 1
print(dict(word_counts)) # Output: {'apple': 2, 'banana': 1, 'cherry': 1}
Important Behaviors and Pitfalls
A crucial behavior to understand is that simply accessing a key will create it if it doesn’t exist. This can be a major pitfall if you are checking for membership or iterating carelessly.
dd = defaultdict(list)
# Checking if a key exists...
if 'new_key' in dd: # This is False, no problem.
pass
# ...but this innocent-looking line creates the key!
value = dd['new_key'] # This calls list(), creates the key with value [], and returns [].
print('new_key' in dd) # Now this is True!
print(dd) # defaultdict(<class 'list'>, {'new_key': []})
This behavior can silently inflate the size of your dictionary if you use a statement like if my_dict[key]: on a key that doesn’t exist. The membership check in is safe, but directly using the key in an expression is not.
Furthermore, be cautious when using mutable default factories like list or dict. Because the same default mutable object is not used for every missing key (the factory is called each time), this avoids the common pitfall associated with defining a default mutable argument in a function. Each new key gets a brand new list() or dict().
Best Practices and When to Use It
- Intention-Revealing Code: Use
defaultdictwhen your code’s logic inherently involves setting a default value upon first access. It makes the programmer’s intent clearer than usingdict.get(key, default)or try/except blocks for this specific pattern. - Prefer
infor Membership Checks: If you only need to check if a key exists without creating it, always use thekey in my_defaultdictsyntax. - Replicating Behavior: To get the standard dictionary behavior of raising a
KeyErrorfor missing keys, you can usedict(my_defaultdict)to create a standard dict copy. - Alternatives: For simpler cases where you just want a one-time default value on a
getoperation,dict.get(key, default)might be sufficient and less “magical.”defaultdictis most powerful when you are building a data structure through multiple accesses.