The heapq module provides an implementation of the heap queue algorithm, also known as the priority queue algorithm. Heaps are binary trees for which every parent node has a value less than or equal to any of its children. This implementation uses arrays for which heap[k] <= heap[2*k+1] and heap[k] <= heap[2*k+2] for all k, counting elements from zero. The interesting property of this structure is that the smallest element is always at the root, heap[0]. This makes it exceptionally efficient for implementing priority queues where the smallest (or highest priority) item needs to be accessed repeatedly.

Unlike other tree structures that require custom node classes with pointers, heapq represents the heap as a simple list. This memory-efficient representation leverages mathematical relationships between a node’s index and the indices of its children and parent. For a node at index i, its left child is at index 2*i + 1, its right child at 2*i + 2, and its parent is at index (i-1) // 2. This allows all heap operations to be performed by list indexing and swaps, avoiding the overhead of object pointers.

Core Heap Operations: heapify, heappush, heappop

The module’s fundamental functions manipulate a list to maintain the heap invariant. heapq.heapify(x) transforms the list x into a heap, in-place, in linear time. It does not create a new list but rearranges the elements of the existing one. This is followed by heapq.heappush(heap, item) to push a new value onto the heap, and heapq.heappop(heap) to pop and return the smallest item from the heap. Each push and pop operation maintains the heap invariant and has O(log n) time complexity.

import heapq

# Start with an unsorted list
data = [3, 1, 4, 1, 5, 9, 2, 6]
print("Original list:", data)

# Transform the list into a valid heap
heapq.heapify(data)
print("Heapified list:", data)  # Output will be a heap, e.g., [1, 1, 2, 3, 5, 9, 4, 6]

# Push a new item onto the heap
heapq.heappush(data, -1)
print("After push(-1):", data)  # The new smallest item is at the root

# Repeatedly pop the smallest item
sorted_list = []
while data:
    smallest = heapq.heappop(data)
    sorted_list.append(smallest)
print("Sorted by popping:", sorted_list)

Accessing Without Removing: heap[0]

A common need is to examine the next item in the queue without removing it. Because the heap invariant guarantees the smallest item is at index 0, this can be done in constant time by simply accessing heap[0]. It is crucial to understand that this is the only safe way to peek at the heap; other elements are not fully sorted and should not be relied upon for ordered access.

import heapq

tasks = []
heapq.heappush(tasks, (30, "Write report"))
heapq.heappush(tasks, (10, "Respond to email"))
heapq.heappush(tasks, (20, "Code review"))

next_priority, next_task = tasks[0]  # Peek at the highest priority task
print(f"Next task: '{next_task}' with priority {next_priority}")
# Output: Next task: 'Respond to email' with priority 10

Advanced Operations: heapreplace and nlargest/nsmallest

heapq.heapreplace(heap, item) is a more efficient alternative to a heappop() followed by a heappush(). It pops and returns the smallest item and then pushes the new item. The value returned can be larger than the item added, so this operation is useful in simulation contexts, like processing events where you handle the current event and then add the next one. In contrast, heappushpop(heap, item) pushes the new item first then pops the smallest, which is more efficient if the new item is smaller than the current root.

Furthermore, heapq provides nlargest(n, iterable) and nsmallest(n, iterable) which are useful for finding a limited number of extreme values without fully sorting the entire dataset. For large inputs, these functions are more memory-efficient than sorting the entire collection and then slicing.

import heapq

# Example of heapreplace
scores = [10, 20, 30]
heapq.heapify(scores)
old_min = heapq.heapreplace(scores, 40)
print(f"Popped old minimum: {old_min}, new heap: {scores}")
# Output: Popped old minimum: 10, new heap: [20, 40, 30]

# Example of nlargest
all_scores = [84, 92, 11, 99, 78, 100, 87]
top_3 = heapq.nlargest(3, all_scores)
print(f"Top 3 scores: {top_3}")  # Output: Top 3 scores: [100, 99, 92]

Common Pitfalls and Best Practices

A frequent mistake is assuming that a heapified list is fully sorted. It is not. The heap invariant only guarantees the relationship between parent and child nodes, not a linear ordering. The only way to get a sorted list is to repeatedly heappop.

Another critical pitfall involves storing mutable objects. Since the heap structure is maintained by comparing items, if the priority of an object inside the heap is modified after insertion, the heap invariant will be broken, and the behavior of subsequent operations becomes undefined. This is a common issue when using custom objects where an external reference is used to change a field used for comparison.

The standard solution for complex priorities is the “priority, task” tuple pattern. Tuples are compared lexicographically (element-by-element), so (priority, task) pairs will be ordered by priority. If priorities are equal, the task object (which must itself be comparable) will be used to break the tie. To avoid comparing the task, a triple like (priority, counter, task) can be used, where an ever-increasing counter ensures that later-inserted tasks with the same priority are ordered consistently.

import heapq
import itertools

# Safe pattern for mutable tasks or equal priorities
priority_queue = []
counter = itertools.count()  # Unique sequence counter

def add_task(priority, task):
    count = next(counter)
    heapq.heappush(priority_queue, (priority, count, task))

add_task(2, "Task A")
add_task(1, "Task B")
add_task(2, "Task C")  # Same priority as Task A

# Task B (pri 1) will pop first. Then, between A and C (both pri 2),
# the one with the lower count (Task A) will pop next.
while priority_queue:
    priority, count, task = heapq.heappop(priority_queue)
    print(f"Priority: {priority}, Task: {task}")