49.4 as_completed() and wait()

The concurrent.futures module provides two powerful functions for managing and synchronizing with multiple Future objects: as_completed() and wait(). While both deal with collections of futures, they serve fundamentally different purposes and exhibit distinct behaviors, making each suitable for specific scenarios. Understanding their nuances is critical for writing robust and efficient concurrent applications.

The as_completed() Function

The as_completed() function is an iterator that yields futures as they complete, regardless of the order in which they were originally submitted. This non-blocking generator is exceptionally useful when you need to process results immediately upon availability, rather than waiting for all tasks or a specific subset to finish. This is the ideal pattern for handling tasks with highly variable execution times; you can begin post-processing on a slow task’s result without being blocked by an even slower one.

The following example demonstrates how as_completed() returns results out of submission order. Notice how the output reflects completion order, not the order of the numbers list.

import concurrent.futures
import time
import random

def simulate_task(n):
    # Simulate variable workload
    sleep_time = random.uniform(0.1, 1.5)
    time.sleep(sleep_time)
    return f"Processed {n} (slept for {sleep_time:.2f}s)"

numbers = [5, 2, 8, 1, 3]

with concurrent.futures.ThreadPoolExecutor() as executor:
    # Submit all tasks and store the futures in a list
    future_to_number = {
        executor.submit(simulate_task, num): num for num in numbers
    }

    # Process results as they become available
    for future in concurrent.futures.as_completed(future_to_number):
        original_number = future_to_number[future]
        try:
            result = future.result()
            print(f"Completed: {result}")
        except Exception as exc:
            print(f'Task for {original_number} generated an exception: {exc}')

A crucial best practice exemplified here is creating a mapping (like future_to_number) between the Future object and the original data used to create it. Since as_completed() yields futures in an arbitrary order, this mapping is the only way to correlate a result with its initial input. A common pitfall is submitting tasks in a loop without storing this context, leaving you with results that are impossible to interpret correctly.

The wait() Function

In contrast to the iterative as_completed(), the wait() function is a blocking call that allows you to wait for a group of futures to reach a specific state. It provides fine-grained control over the waiting condition through its return_when parameter, which accepts one of three constants from the module:

FIRST_COMPLETED: Returns when any future completes or is cancelled.
FIRST_EXCEPTION: Returns when any future completes by raising an exception (or when all complete).
ALL_COMPLETED: Returns when all futures complete or are cancelled (this is the default).

The function returns a named tuple containing two sets: done and not_done. This is particularly powerful for implementing patterns like progress tracking, early termination upon failure, or chunked processing.

import concurrent.futures

def task_that_might_fail(n):
    if n == 13:
        raise ValueError("Unlucky number!")
    return n ** 2

futures = []
with concurrent.futures.ThreadPoolExecutor() as executor:
    for i in range(10):
        futures.append(executor.submit(task_that_might_fail, i))

    # Wait for ONLY the first exception to occur
    done_set, not_done_set = concurrent.futures.wait(
        futures, return_when=concurrent.futures.FIRST_EXCEPTION
    )

    # Check the done_set for any exceptions
    for future in done_set:
        if future.exception() is not None:
            print(f"Early termination due to: {future.exception()}")
            # You could now choose to cancel all remaining futures
            for f in not_done_set:
                f.cancel()
            break
    else:
        # This else corresponds to the for-loop (runs if no break occurred)
        print("All tasks completed successfully or were cancelled.")

Key Differences and When to Use Each

The choice between as_completed() and wait() hinges on your processing requirements. Use as_completed() for result-driven processing where you want to handle each result the moment it’s ready, maximizing responsiveness and throughput for heterogeneous tasks. Its generator nature makes it memory efficient for large numbers of futures.

Use wait() for synchronization and state monitoring. It is the correct tool when you need to know when a specific milestone is reached among a group of tasks, such as waiting for the first task to fail (FIRST_EXCEPTION), waiting for a certain number of tasks to finish before proceeding, or simply blocking until an entire batch is done (ALL_COMPLETED) before performing a collective operation on all results.

A critical edge case to consider with both functions is the handling of cancelled futures. Both as_completed() and wait() (with ALL_COMPLETED) will consider a cancelled future as “done.” If your code calls future.result() on a cancelled future, it will raise a CancelledError. Therefore, it is a best practice to always check the state of a future (future.done(), future.cancelled()) or use a try-except block around result() to handle potential exceptions gracefully.