35.7 itertools Recipes from the Documentation
The itertools documentation itself includes a “recipes” section—a collection of useful functions built from the toolkit’s own primitives. These recipes are not part of the module itself but are provided as patterns for users to copy and adapt. They represent canonical, high-performance solutions to common iteration problems, embodying the functional and efficient spirit of the module. Understanding these recipes is crucial for mastering itertools, as they demonstrate how to effectively combine its components.
The accumulate Recipe for Non-Numeric Sequences
While itertools.accumulate is famously used for running sums or products, its true power lies in its generality. It can apply any two-argument function sequentially. A common recipe extends it to non-numeric data, such as finding the running maximum or minimum.
import itertools
# Running maximum using accumulate
data = [3, 1, 4, 1, 5, 9, 2, 6]
running_max = itertools.accumulate(data, max)
print(list(running_max)) # Output: [3, 3, 4, 4, 5, 9, 9, 9]
# Running minimum
running_min = itertools.accumulate(data, min)
print(list(running_min)) # Output: [3, 1, 1, 1, 1, 1, 1, 1]
Why it works: accumulate(iterable, func) defaults to addition but can use any function func(x, y). For max, it compares the current maximum (the accumulated value) with the next element (y), yielding the new maximum at each step. This is far more efficient than repeatedly calling max() on an ever-growing slice of the list, which would be an O(n²) operation. This approach is O(n).
The takewhile and dropwhile Combo for Partitioning
A frequent task is splitting an iterable into two parts: one that satisfies a predicate and one that doesn’t, starting from the beginning. The takewhile and dropwhile functions are a natural pair for this.
import itertools
def partition(predicate, iterable):
"""Use takewhile and dropwhile to split an iterable at the first false predicate."""
t = itertools.takewhile(predicate, iterable)
f = itertools.dropwhile(predicate, iterable)
return list(t), list(f)
numbers = [1, 3, 5, 8, 9, 11, 13]
is_odd = lambda x: x % 2 == 1
first_odd_group, the_rest = partition(is_odd, numbers)
print(f"Leading odds: {first_odd_group}") # Output: [1, 3, 5]
print(f"The rest: {the_rest}") # Output: [8, 9, 11, 13]
Why it works: takewhile yields items from iterable as long as predicate(item) is true. It stops at the first false value. Conversely, dropwhile discards items from iterable as long as the predicate is true, then yields all remaining items without re-checking the predicate. This makes them a perfect complement for partitioning a sequence from its head. A crucial pitfall to avoid is assuming this partitions the entire iterable based on the predicate; it only partitions based on the first run of matches.
Efficient n-th Element Extraction with islice
The recipe for grabbing the n-th element from an iterator is a perfect use case for itertools.islice, as it handles the iterator protocol correctly without converting the entire iterable to a list.
import itertools
def nth(iterable, n, default=None):
"""Returns the nth item or a default value. Consumes the iterator up to n."""
return next(itertools.islice(iterable, n, None), default)
# Get the 4th element (index 3) from a large range iterator
large_data = iter(range(100_000_000))
result = nth(large_data, 3)
print(result) # Output: 3
# The iterator's state is advanced
print(next(large_data)) # Output: 4
# Using the default for an out-of-bounds request
short_data = iter([10, 20])
result = nth(short_data, 5, default="Not Found")
print(result) # Output: Not Found
Why it works: islice(iterable, start, stop, step) is the iterator-safe version of slicing. islice(iterable, n, None) effectively skips the first n elements and then tries to return the next one (the n+1-th element in zero-based indexing, which is the n-th element in 1-based counting). Wrapping this in next() with a default value provides a clean, safe, and memory-efficient way to index into any iterable, especially those that are too large to fit in memory.
The unique_everseen Recipe for De-duplication
This recipe creates a generator that returns unique elements, preserving order. It is more flexible than using a set for deduplication because it works with unhashable types and allows a custom key function.
import itertools
def unique_everseen(iterable, key=None):
"""List unique elements, preserving order. Remember all elements ever seen."""
seen = set()
seen_add = seen.add
if key is None:
for element in itertools.filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
# Deduplicate a list, preserving order
data = ['a', 'b', 'a', 'c', 'd', 'b', 'e']
print(list(unique_everseen(data))) # Output: ['a', 'b', 'c', 'd', 'e']
# Deduplicate using a key function (case-insensitive)
data_mixed_case = ['apple', 'Orange', 'APPLE', 'banana']
print(list(unique_everseen(data_mixed_case, key=str.lower)))
# Output: ['apple', 'Orange', 'banana']
Why it works: The recipe uses a set (seen) to track which elements (or keys) have already been encountered. The filterfalse(seen.__contains__, iterable) function is the clever part: it filters the iterable, only yielding items for which item in seen is False. As each new item is yielded, it is immediately added to the seen set. The assignment seen_add = seen.add is a micro-optimization that avoids the attribute lookup overhead in the tight loop. The key function version generalizes this by applying the transformation before the membership test. This is superior to a naive approach of checking a list for membership, which would be O(n) per check, making the whole algorithm O(n²). Using a set for membership testing keeps it at O(n).