34.3 filter(): Selecting Elements by Predicate

The filter() function constructs an iterator from those elements of an iterable for which a provided function returns True. It is the primary tool in functional programming for selectively including items from a sequence based on a logical condition, effectively filtering out unwanted elements. Its elegance lies in its declarative nature; you specify what you want to keep (the condition), not how to iterate and check each item.

Its syntax is filter(function, iterable). The function argument is often called a predicate—a function that returns a boolean value. The iterable is any object capable of returning its elements one at a time, such as a list, tuple, or string.

How filter() Works Under the Hood

Conceptually, filter() implements a simple loop. You can think of it as performing the following operation, though its actual implementation in C is far more efficient:

def filter(function, iterable):
    result = []
    for item in iterable:
        if function(item):
            result.append(item)
    return result

However, a crucial distinction is that the real filter() returns a generator (a filter object), not a list. This is a memory-efficient design choice. It doesn’t process any elements or allocate memory for a new list until you start consuming the iterator (e.g., by calling list() on it or using it in a for loop). This lazy evaluation is vital for working with large or even infinite sequences.

Using filter() with a Named Function

The most explicit way to use filter() is with a predefined named function. This is often best when the predicate logic is complex or reusable.

def is_positive(n):
    return n > 0

numbers = [-5, 2, -12, 8, 0, 14]
positive_numbers = filter(is_positive, numbers)

print(list(positive_numbers))  # Output: [2, 8, 14]

Here, is_positive is applied to each element in numbers. Only the elements for which it returns True are included in the resulting filter object, which we then convert to a list to see all results.

Using filter() with a Lambda Function

For simple, one-off conditions, a lambda function is the most common and concise companion to filter(). It allows you to define the predicate inline without the ceremony of a full def statement.

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even_numbers = filter(lambda x: x % 2 == 0, numbers)

print(f"Even numbers: {list(even_numbers)}")
# Output: Even numbers: [2, 4, 6, 8, 10]

Using filter() with None as the Function

A unique and sometimes surprising behavior occurs when None is passed as the first argument. In this case, filter() defaults to using the bool() function as the predicate. This means it will filter out all elements that are “falsy” (e.g., 0, False, None, '', [], {}).

mixed_data = [0, 1, False, True, '', 'hello', [], [1, 2], None, 42]
truthy_values = filter(None, mixed_data)

print(list(truthy_values))
# Output: [1, True, 'hello', [1, 2], 42]

This idiom is a very concise way to remove all falsy values from a list but should be used with clear intent, as it might be less obvious to readers than a list comprehension like [x for x in mixed_data if x].

Common Pitfalls and Best Practices

A frequent mistake is forgetting that filter() returns an iterator. Beginners often expect it to be a list and might check its length or print it directly, leading to confusion.

result = filter(lambda x: x > 5, [1, 10, 3, 8])
print(result)        # Output: <filter object at 0x...> (not a list!)
print(len(result))   # TypeError: object of type 'filter' has no len()

# Correct approach: consume the iterator
result_list = list(result)
print(result_list)   # Output: [10, 8]

Another consideration is the choice between filter() and list comprehensions. For simple filters, a list comprehension is often more Pythonic and readable.

# Using filter
result_filter = list(filter(lambda x: x % 2 == 0, numbers))

# Using a list comprehension
result_comp = [x for x in numbers if x % 2 == 0]

# Both results are identical: [2, 4, 6, 8, 10]

The list comprehension is generally preferred for its clarity. However, filter() can be more readable when using a well-named predicate function, as it reads almost like English: filter(is_prime, numbers). The choice often comes down to the complexity of the condition and stylistic preference. For chaining multiple operations (e.g., map after filter), the functional style can sometimes offer a more linear and clear flow of data transformation.