35.5 compress, starmap, filterfalse, zip_longest

Understanding compress()

The compress() function creates an iterator that selectively filters elements from an input iterable based on a corresponding sequence of Boolean selectors. Its signature is itertools.compress(data, selectors). It returns an iterator yielding items from data for which the corresponding item in selectors evaluates to True. This function is conceptually similar to a conditional filter but operates based on a predefined selector sequence rather than a predicate function.

A crucial aspect of compress() is its behavior when the data and selectors iterables are of different lengths. The iteration stops as soon as either of the two input iterables is exhausted. This means you will not get an error for mismatched lengths, but you may get unexpected results if you assume the shorter iterable will be padded.

import itertools

data = ['A', 'B', 'C', 'D', 'E']
selectors = [True, False, 1, 0, 1] # Truthy and Falsy values are evaluated

result = itertools.compress(data, selectors)
print(list(result)) # Output: ['A', 'C', 'E']

Common Pitfall: The function uses truth value testing, not strict equality to True. This means values like 1, non-empty strings, or non-empty lists will be considered True, while 0, False, None, and empty containers will be considered False. Always ensure your selector iterable contains the intended Boolean values or be certain of how your data will be evaluated in a Boolean context.

Mastering starmap()

The starmap() function is a variant of the built-in map() designed for situations where the function argument expects multiple individual arguments, but the data to be processed is already grouped into tuples or other iterables. Its signature is itertools.starmap(function, iterable). It applies function to the elements of each item in the iterable, using the item’s elements as the arguments. Essentially, it performs function(*item) for every item in the iterable.

This is immensely useful when working with data that is naturally structured in tuples, such as rows from a CSV reader or coordinates from a list of points.

import itertools

def calculate_volume(length, width, height):
    return length * width * height

# A list of tuples, where each tuple represents (length, width, height)
dimensions = [(2, 3, 4), (1, 1, 5), (10, 2, 1)]

# Using map would require an awkward lambda: map(lambda d: calculate_volume(*d), dimensions)
# Using starmap is more elegant and efficient.
volumes = itertools.starmap(calculate_volume, dimensions)
print(list(volumes)) # Output: [24, 5, 20]

Why it matters: While you could achieve the same result with map() and a lambda that unpacks the tuple (lambda args: func(*args)), starmap() provides a more readable, declarative, and often more efficient way to express this common operation.

Leveraging filterfalse()

The filterfalse() function is the logical inverse of the built-in filter() function. Its signature is itertools.filterfalse(predicate, iterable). It returns an iterator that yields all elements from the input iterable for which the predicate function returns False or a falsy value. If predicate is None, it returns the items that are falsy themselves.

This function is invaluable for expressing negative conditions clearly. Instead of writing filter(lambda x: not condition(x), iterable), you can use filterfalse(condition, iterable), which is more direct and avoids the potential confusion of a double negative.

import itertools

numbers = [0, 1, 2, 3, 4, 5, 0, 11]

# Filter out even numbers (predicate returns True for even, so filterfalse keeps odds)
odds = itertools.filterfalse(lambda x: x % 2 == 0, numbers)
print(list(odds)) # Output: [1, 3, 5, 11]

# With predicate=None, it filters out all truthy values, leaving only falsy ones.
falsy_values = itertools.filterfalse(None, numbers)
print(list(falsy_values)) # Output: [0, 0]

Best Practice: Use filterfalse() to improve code clarity whenever your filtering logic is naturally expressed as a condition for excluding items. It makes the intention of the code more explicit than the equivalent filter() with a negated condition.

Utilizing zip_longest()

The zip_longest() function extends the behavior of the built-in zip(). While zip() stops at the length of the shortest iterable, zip_longest() continues until the longest iterable is exhausted. Its signature is itertools.zip_longest(*iterables, fillvalue=None). Missing values from shorter iterables are substituted with the fillvalue.

This is essential for processing multiple data sources of unknown or unequal length in lockstep, where you need to handle missing data explicitly rather than truncating the result.

import itertools

headers = ['Name', 'Score', 'Grade']
row_one = ['Alice', 95]
row_two = ['Bob', 87, 'A']
row_three = ['Charlie']

# Regular zip would truncate to the shortest list, losing data.
# print(list(zip(headers, row_one))) # [('Name', 'Alice'), ('Score', 95)]

# zip_longest uses None to fill missing values by default.
zipped = itertools.zip_longest(headers, row_one, row_two, row_three)
for item in zipped:
    print(item)
# Output:
# ('Name', 'Alice', 'Bob', 'Charlie')
# ('Score', 95, 87, None)
# ('Grade', None, 'A', None)

# Using a custom fillvalue, like an empty string, is common for data processing.
zipped_with_fill = itertools.zip_longest(headers, row_one, row_two, row_three, fillvalue='N/A')
print(list(zipped_with_fill))
# Output: [('Name', 'Alice', 'Bob', 'Charlie'), ('Score', 95, 87, 'N/A'), ('Grade', 'N/A', 'A', 'N/A')]

Key Consideration: The choice of fillvalue is critical. Use None to clearly indicate missing data, or use a domain-specific default (like 0, '', or 'N/A') if the data will be processed further and requires a value of a specific type. Always be aware that the fillvalue is a single object used for all missing positions; for mutable defaults like [], this can lead to unintended shared references if the results are modified.