14.7 Real-World Uses: Deduplication and Relationship Queries
Sets, being mutable and unordered collections of unique elements, and frozensets, their immutable counterparts, are uniquely suited for two critical real-world tasks: deduplication of data and efficient relationship queries. Their underlying hash-based implementation provides average-case O(1) time complexity for membership tests (in), which is the cornerstone of their performance advantages in these areas. The Power of Hashing for Deduplication The most immediate and common use of a set is to remove duplicates from a sequence. This operation is effective because a set, by its very definition, cannot contain duplicate elements. When you create a set from an iterable, each element is hashed. If the hash (and subsequently the value) already exists in the set, the new addition is simply ignored. This process is far more efficient than the manual alternative of iterating and checking against a list, which has O(n) complexity for each membership test, leading to an overall O(n²) operation.