45.9 bisect: Maintaining a Sorted List
The bisect module in Python provides an elegant and efficient solution for maintaining a sorted list without the overhead of re-sorting after every insertion. Its core functionality revolves around the bisection algorithm (a variant of binary search), which efficiently locates the insertion point for a new element to keep the list in sorted order. This approach is far more performant than the naive method of appending and then re-sorting (list.append() followed by list.sort()), which has an average time complexity of O(n log n) for each insertion. In contrast, bisect finds the insertion point in O(log n) time, though the subsequent insertion with list.insert() remains an O(n) operation due to the need to shift subsequent elements in the list.
Core Functions: bisect() and insort()
The module’s two most important functions are bisect.bisect_left() (often aliased as bisect.bisect()) and bisect.insort_left() (often aliased as bisect.insort()).
bisect_left(a, x, lo=0, hi=len(a))returns the index in the sorted listawhere the elementxshould be inserted to maintain sorted order. Ifxis already present in the list, the returned index is the position of the first occurrence ofx. The “left” in the name signifies that it biases the search to the left in the case of duplicates.bisect_right(a, x, lo=0, hi=len(a))is similar but returns the insertion point after any existing entries ofx. The index returned is the position wherexwould be inserted to come after all existingxvalues. This is useful for dividing a list into ranges.
The insort functions are the practical application of the bisect functions; they perform the insertion for you.
insort_left(a, x, lo=0, hi=len(a))insertsxintoaat the position returned bybisect_left(a, x).insort_right(a, x, lo=0, hi=len(a)), aliased asinsort(), insertsxat the position returned bybisect_right(a, x).
import bisect
# Demonstrating bisect_left vs bisect_right
sorted_list = [1, 3, 5, 5, 5, 7, 9]
# Find the leftmost point where 5 belongs (index of first 5)
left_index = bisect.bisect_left(sorted_list, 5)
print(f"bisect_left index for 5: {left_index}") # Output: 2
# Find the rightmost point where 5 belongs (index after last 5)
right_index = bisect.bisect_right(sorted_list, 5)
print(f"bisect_right index for 5: {right_index}") # Output: 5
# Using insort to maintain sorted order
data = [10, 20, 30, 40]
bisect.insort(data, 25)
bisect.insort(data, 15)
bisect.insort(data, 30) # Inserting a duplicate
print(f"List after insort operations: {data}") # Output: [10, 15, 20, 25, 30, 30, 40]
Practical Applications and Use Cases
The bisect module excels in scenarios requiring efficient lookups and maintenance of ordered data. A classic use case is mapping numerical scores to letter grades. Instead of using a long series of if/elif statements, you can use bisect for a clean, table-driven approach.
def grade_calculator(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
"""Maps a numerical score to a letter grade using bisection."""
index = bisect.bisect(breakpoints, score)
return grades[index]
print(grade_calculator(95)) # Output: A
print(grade_calculator(82)) # Output: B
print(grade_calculator(65)) # Output: D
print(grade_calculator(55)) # Output: F
Other common applications include finding the closest value in a list to a given number, implementing a simple priority queue (though the heapq module is better for this), and performing range queries on sorted data.
Common Pitfalls and Best Practices
The List Must Be Pre-Sorted: The most critical requirement for
bisectis that the input list must already be sorted in ascending order. The functions will not check this precondition, and usingbisecton an unsorted list will produce incorrect, unpredictable results without raising an error. It is the developer’s responsibility to ensure the list is sorted before the firstbisectoperation and remains sorted by only usinginsortor other careful methods for insertion.Performance of Insertion: While finding the insertion point is fast (O(log n)), the actual insertion into a list using
list.insert()is an O(n) operation because it may require shifting all elements after the insertion point. For very large lists where insertion frequency is high, this can become a bottleneck. In such cases, a data structure like a balanced binary search tree (e.g., from thesortedcontainersthird-party module) or a skip list might be more appropriate, though they are more complex.Working with Complex Objects: To use
bisectwith lists of custom objects or tuples, you must ensure they support comparison operations (<,==,>) in a way that is consistent with your desired sort order. A powerful pattern is to use a separate list of keys. This is more efficient than using thekeyfunction withsort()becausebisectallows you to search without creating a shadow list of keys for the entire dataset.
from dataclasses import dataclass
@dataclass
class Employee:
name: str
id: int
# List of objects sorted by 'id'
employees_sorted_by_id = [Employee('Alice', 2), Employee('Bob', 5), Employee('Charlie', 8)]
# We can bisect by a key by creating a list of just the keys we care about.
keys = [e.id for e in employees_sorted_by_id]
# Find the index for the employee with id <= 4
index = bisect.bisect_left(keys, 4)
if index < len(employees_sorted_by_id):
found_employee = employees_sorted_by_id[index]
print(found_employee) # Output: Employee(name='Bob', id=5)
- Memory Considerations: The
bisectmodule operates on in-memory lists. For datasets too large to fit comfortably in memory, a database with indexed columns is the proper tool, as it performs a similar binary search on disk.