45.9 bisect: Maintaining a Sorted List

The bisect module in Python provides an elegant and efficient solution for maintaining a sorted list without the overhead of re-sorting after every insertion. Its core functionality revolves around the bisection algorithm (a variant of binary search), which efficiently locates the insertion point for a new element to keep the list in sorted order. This approach is far more performant than the naive method of appending and then re-sorting (list.append() followed by list.sort()), which has an average time complexity of O(n log n) for each insertion. In contrast, bisect finds the insertion point in O(log n) time, though the subsequent insertion with list.insert() remains an O(n) operation due to the need to shift subsequent elements in the list.

Core Functions: bisect() and insort()

The module’s two most important functions are bisect.bisect_left() (often aliased as bisect.bisect()) and bisect.insort_left() (often aliased as bisect.insort()).

bisect_left(a, x, lo=0, hi=len(a)) returns the index in the sorted list a where the element x should be inserted to maintain sorted order. If x is already present in the list, the returned index is the position of the first occurrence of x. The “left” in the name signifies that it biases the search to the left in the case of duplicates.
bisect_right(a, x, lo=0, hi=len(a)) is similar but returns the insertion point after any existing entries of x. The index returned is the position where x would be inserted to come after all existing x values. This is useful for dividing a list into ranges.

The insort functions are the practical application of the bisect functions; they perform the insertion for you.

insort_left(a, x, lo=0, hi=len(a)) inserts x into a at the position returned by bisect_left(a, x).
insort_right(a, x, lo=0, hi=len(a)), aliased as insort(), inserts x at the position returned by bisect_right(a, x).

import bisect

# Demonstrating bisect_left vs bisect_right
sorted_list = [1, 3, 5, 5, 5, 7, 9]

# Find the leftmost point where 5 belongs (index of first 5)
left_index = bisect.bisect_left(sorted_list, 5)
print(f"bisect_left index for 5: {left_index}")  # Output: 2

# Find the rightmost point where 5 belongs (index after last 5)
right_index = bisect.bisect_right(sorted_list, 5)
print(f"bisect_right index for 5: {right_index}") # Output: 5

# Using insort to maintain sorted order
data = [10, 20, 30, 40]
bisect.insort(data, 25)
bisect.insort(data, 15)
bisect.insort(data, 30)  # Inserting a duplicate
print(f"List after insort operations: {data}") # Output: [10, 15, 20, 25, 30, 30, 40]

Practical Applications and Use Cases

The bisect module excels in scenarios requiring efficient lookups and maintenance of ordered data. A classic use case is mapping numerical scores to letter grades. Instead of using a long series of if/elif statements, you can use bisect for a clean, table-driven approach.

def grade_calculator(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
    """Maps a numerical score to a letter grade using bisection."""
    index = bisect.bisect(breakpoints, score)
    return grades[index]

print(grade_calculator(95))  # Output: A
print(grade_calculator(82))  # Output: B
print(grade_calculator(65))  # Output: D
print(grade_calculator(55))  # Output: F

Other common applications include finding the closest value in a list to a given number, implementing a simple priority queue (though the heapq module is better for this), and performing range queries on sorted data.

Common Pitfalls and Best Practices

The List Must Be Pre-Sorted: The most critical requirement for bisect is that the input list must already be sorted in ascending order. The functions will not check this precondition, and using bisect on an unsorted list will produce incorrect, unpredictable results without raising an error. It is the developer’s responsibility to ensure the list is sorted before the first bisect operation and remains sorted by only using insort or other careful methods for insertion.
Performance of Insertion: While finding the insertion point is fast (O(log n)), the actual insertion into a list using list.insert() is an O(n) operation because it may require shifting all elements after the insertion point. For very large lists where insertion frequency is high, this can become a bottleneck. In such cases, a data structure like a balanced binary search tree (e.g., from the sortedcontainers third-party module) or a skip list might be more appropriate, though they are more complex.
Working with Complex Objects: To use bisect with lists of custom objects or tuples, you must ensure they support comparison operations (<, ==, >) in a way that is consistent with your desired sort order. A powerful pattern is to use a separate list of keys. This is more efficient than using the key function with sort() because bisect allows you to search without creating a shadow list of keys for the entire dataset.

from dataclasses import dataclass

@dataclass
class Employee:
    name: str
    id: int

# List of objects sorted by 'id'
employees_sorted_by_id = [Employee('Alice', 2), Employee('Bob', 5), Employee('Charlie', 8)]

# We can bisect by a key by creating a list of just the keys we care about.
keys = [e.id for e in employees_sorted_by_id]

# Find the index for the employee with id <= 4
index = bisect.bisect_left(keys, 4)
if index < len(employees_sorted_by_id):
    found_employee = employees_sorted_by_id[index]
    print(found_employee)  # Output: Employee(name='Bob', id=5)

Memory Considerations: The bisect module operates on in-memory lists. For datasets too large to fit comfortably in memory, a database with indexed columns is the proper tool, as it performs a similar binary search on disk.