45.2 Counter: Counting Hashable Objects
The collections.Counter class is a specialized dictionary subclass designed for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Unlike a standard dictionary, Counter automatically handles missing keys by returning a count of zero instead of raising a KeyError, making it exceptionally well-suited for tallying and frequency analysis tasks.
Initialization and Basic Usage
A Counter can be initialized in several ways: with a sequence of items, a dictionary containing keys and counts, keyword arguments, or even another Counter object. The most common method is to pass an iterable, and the Counter will tally the occurrences of each element within it.
from collections import Counter
# Initialize from an iterable
inventory = Counter(['apple', 'orange', 'apple', 'banana', 'orange', 'apple'])
print(inventory)
# Output: Counter({'apple': 3, 'orange': 2, 'banana': 1})
# Initialize from a dictionary
starting_counts = {'widgets': 100, 'gadgets': 50}
warehouse = Counter(starting_counts)
print(warehouse['widgets']) # Output: 100
# Initialize with keyword arguments
vote_tally = Counter(Alice=12, Bob=9, Charlie=15)
print(vote_tally)
# Output: Counter({'Charlie': 15, 'Alice': 12, 'Bob': 9})
Key Methods and Operations
The power of Counter lies in its specialized methods beyond the standard dictionary interface. The .most_common(n) method returns the n most frequent items as a list of tuples. If n is omitted or None, it returns all items, ordered from most to least common. This is invaluable for finding top entries.
word_freq = Counter('abracadabra')
print(word_freq.most_common(3))
# Output: [('a', 5), ('b', 2), ('r', 2)]
Mathematical operations are also supported. You can add counts from two counters together, subtract them, or find intersections and unions. The addition (+) operator combines counts, while subtraction (-) subtracts counts but removes non-positive results. The union (|) returns the maximum of each key’s count, and the intersection (&) returns the minimum.
sales_q1 = Counter(jan=10, feb=5, mar=12)
sales_q2 = Counter(mar=8, apr=15, may=20)
# Combine sales
total_jan_to_may = sales_q1 + sales_q2
print(total_jan_to_may) # Output: Counter({'may': 20, 'apr': 15, 'jan': 10, 'mar': 20, 'feb': 5})
# Find sales that decreased in Q2
decreased_sales = sales_q1 - sales_q2
print(decreased_sales) # Output: Counter({'jan': 10, 'feb': 5, 'mar': 4})
Updating Counts and Subtracting
The .update() and .subtract() methods modify the counter in-place. .update() works like addition but incorporates new data into the existing counts. .subtract() similarly performs in-place subtraction. Crucially, unlike the - operator, .subtract() allows counts to become zero or negative.
tally = Counter(cats=4, dogs=8)
tally.update(['cats', 'dogs', 'hamsters'])
print(tally)
# Output: Counter({'dogs': 9, 'cats': 5, 'hamsters': 1})
tally.subtract({'cats': 2, 'dogs': 1})
print(tally)
# Output: Counter({'dogs': 8, 'cats': 3, 'hamsters': 1})
tally.subtract({'cats': 5}) # This will make the 'cats' count negative
print(tally['cats']) # Output: -2
Pitfalls and Best Practices
A common pitfall is assuming a Counter will behave exactly like a dictionary in all contexts. While it is a subclass, its overridden methods mean some behaviors differ. For instance, setting a count to zero does not remove the key from the counter; it remains present with a value of zero. To remove an element entirely, use the del statement.
c = Counter(a=4, b=2, c=0, d=-2)
print(c) # Output: Counter({'a': 4, 'b': 2, 'c': 0, 'd': -2})
# Setting to zero doesn't remove it
c['b'] = 0
print(c) # Output: Counter({'a': 4, 'c': 0, 'b': 0, 'd': -2})
# You must use 'del' to remove an entry
del c['b']
print(c) # Output: Counter({'a': 4, 'c': 0, 'd': -2})
Another best practice is to use Counter for its intended purpose: counting. While you can store any integer value, including negatives and zeros, using it as a general-purpose dictionary with integer values can lead to confusing code. For general key-int storage, a defaultdict(int) is often more semantically clear. The elements() method, which returns an iterator over elements repeating each as many times as its count, will only yield items with a positive count, ignoring zeros and negatives. This behavior is important to remember when using this method for reconstituting a list from its counts.