11.4 Sorting: sort() vs sorted(), key Functions, and Stability
In Python, sorting a list is a fundamental operation performed using two primary tools: the list.sort() method and the sorted() built-in function. While both achieve a similar end result, their differences in application and behavior are critical to understand.
The Fundamental Distinction: sort() vs sorted()
The most crucial distinction lies in how they operate on the original list. The list.sort() method is an in-place operation; it modifies the original list directly and returns None. This means the list’s identity in memory remains the same, but its contents are rearranged. Conversely, the sorted() function is an out-of-place operation; it creates a completely new, sorted list from the elements of the iterable passed to it, leaving the original iterable unmodified.
# Demonstrating the difference between sort() and sorted()
original_list = [3, 1, 4, 1, 5, 9, 2]
# Using sorted() - creates a new list
new_sorted_list = sorted(original_list)
print("Original list:", original_list) # Output: [3, 1, 4, 1, 5, 9, 2]
print("New sorted list:", new_sorted_list) # Output: [1, 1, 2, 3, 4, 5, 9]
# Using sort() - modifies the original list in-place
return_value = original_list.sort()
print("Original list after .sort():", original_list) # Output: [1, 1, 2, 3, 4, 5, 9]
print("Return value of .sort():", return_value) # Output: None
This distinction dictates their usage. Use list.sort() when you want to mutate the existing list and do not need the original order. Use sorted() when you need to preserve the original data and have a sorted version, or when you need to sort an iterable that isn’t a list (like a tuple or dictionary view).
Customizing Sort Order with the key Function
Both sort() and sorted() accept a key parameter, which is arguably their most powerful feature. The key parameter expects a function that is called once for each item in the list. The return value of this function is used as the “proxy” value for sorting purposes, allowing you to sort complex objects based on a specific attribute or computed value.
# Sorting with a key function
students = [
{'name': 'Alice', 'grade': 92, 'age': 20},
{'name': 'Bob', 'grade': 85, 'age': 22},
{'name': 'Charlie', 'grade': 92, 'age': 19}
]
# Sort by grade (ascending)
students_sorted_by_grade = sorted(students, key=lambda s: s['grade'])
print("By grade:", students_sorted_by_grade)
# Sort by age (descending)
students.sort(key=lambda s: s['age'], reverse=True)
print("By age (descending):", students)
# More complex key: sort by grade descending, then by age ascending
# The key returns a tuple. Sorting is done based on the first element,
# and if equal, the second element is used.
def multi_sort_key(student):
return (-student['grade'], student['age'])
students.sort(key=multi_sort_key)
print("By grade desc, age asc:", students)
The key function transforms the sorting problem from “sort these complex objects” to “sort these simple proxy values,” which is a much simpler task for the underlying algorithm. Using the operator module’s itemgetter() and attrgetter() can often make key functions more efficient and readable than a lambda.
The Importance of Stability in Sorting
Python’s sorting algorithm, Timsort, is stable. This means that when multiple records have the same key value, their original relative order is preserved in the sorted output. Stability is not just a theoretical property; it is the cornerstone of performing complex, multi-pass sorts.
# Demonstrating sort stability
data = [
('apple', 2),
('banana', 1),
('orange', 2),
('grape', 1)
]
# First, sort by the fruit name (the second element)
data.sort(key=lambda x: x[0])
print("After sorting by fruit name:", data)
# Output: [('apple', 2), ('banana', 1), ('grape', 1), ('orange', 2)]
# Now, sort by the numeric value. The original order for items with the same number is preserved.
data.sort(key=lambda x: x[1])
print("After stable sort by number:", data)
# The two '1's are 'banana' and 'grape'. Their order from the previous sort is maintained.
# The two '2's are 'apple' and 'orange'. Their order is also maintained.
# Output: [('banana', 1), ('grape', 1), ('apple', 2), ('orange', 2)]
This property allows you to achieve a sort on multiple criteria by performing a series of sorts in reverse order of importance. To sort primarily by criterion ‘A’ and secondarily by criterion ‘B’, you would first sort by ‘B’ (the secondary key) and then sort by ‘A’ (the primary key). The stability of the second sort ensures that the ordering from the first (secondary) sort is maintained for all items that share the same primary key.
Common Pitfalls and Best Practices
A frequent pitfall is incorrectly assuming list.sort() returns the sorted list, leading to None being assigned to a variable. Always remember: .sort() returns None.
When using a key function that involves method calls or attribute access (e.g., key=str.lower), be aware that it is called O(n log n) times. For very large lists, if the key function is expensive to compute, consider precomputing the keys (the Decorate-Sort-Undecorate idiom, which sorted() effectively does internally) to avoid the overhead of repeated function calls.
For descending order, use the reverse=True parameter rather than sorting and then reversing the list with .reverse(), as the former is more efficient and semantically clear.
Finally, be cautious when sorting lists containing incompatible types. While Python 3 can sort a list like [1, 'a'] due to its defined ordering for different types, the results are often not meaningful in the context of the application’s logic and can raise a TypeError for more complex comparisons. It is a best practice to ensure lists are homogeneous or that the objects have well-defined comparison methods.