14.5 Hash Requirements for Set Members
The fundamental requirement for an object to be a member of a set or frozenset in Python is that it must be hashable. Hashability is not an inherent property of an object but rather a contract it fulfills by implementing two specific dunder methods: __hash__() and __eq__().
The Hash Contract
An object is considered hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). The critical rule is that for two objects that compare as equal (a == b is True), their hash values must also be identical (hash(a) == hash(b)). The reverse is not required; two objects with the same hash value do not have to be equal (this is a hash collision, which the set handles internally).
This contract is the absolute bedrock of how sets work. A set is implemented using a hash table. When you add an element, Python first computes its hash value. This hash value is used to determine a “slot” or “bucket” in the underlying table where the value should be stored. To check for membership (x in my_set), Python computes hash(x), finds the corresponding bucket, and then checks if x is equal to any of the objects in that specific bucket. This is why the equality comparison must be consistent with the hash. If two equal objects had different hashes, the lookup for one might check the wrong bucket and never find its equal counterpart. Conversely, if two unequal objects had the same hash, they would end up in the same bucket, and the set would use __eq__ to distinguish between them.
Built-in Hashable and Unhashable Types
Most of Python’s immutable built-in types are hashable. This includes:
- Numbers (
int,float,complex) - Strings (
str) - Tuples (but only if all their elements are also hashable)
frozensetbytes
Conversely, mutable container types are typically unhashable because their value, and therefore their hash, can change. This includes:
listsetdict
Attempting to use an unhashable type as a set member results in a clear TypeError.
# Hashable types work fine
number_set = {1, 2, 3.14}
string_set = {'apple', 'banana'}
frozen_set = {frozenset([1, 2]), frozenset([3, 4])}
# Unhashable types cause an error
try:
bad_set = {[1, 2, 3]} # Using a list
except TypeError as e:
print(f"Error: {e}") # Error: unhashable type: 'list'
The Special Case of Tuples
Tuples are immutable, but their hashability is conditional. A tuple’s hash value is computed based on the hashes of its contents. Therefore, if a tuple contains any unhashable element, the tuple itself becomes unhashable.
# Tuple with hashable elements is itself hashable
hashable_tuple = (1, "hello", 3.14)
valid_set = {hashable_tuple}
print(valid_set) # Output: {(1, 'hello', 3.14)}
# Tuple with an unhashable element (a list) becomes unhashable
unhashable_tuple = (1, "hello", [3, 4])
try:
invalid_set = {unhashable_tuple}
except TypeError as e:
print(f"Error: {e}") # Error: unhashable type: 'list'
Custom Classes and Hashing
By default, instances of user-defined classes are hashable. Their hash is derived from their id() (their memory address), and their default equality comparison (==) is also based on id(). This satisfies the hash contract: if two objects are the same instance (a is b), they have the same id, so they have the same hash and are equal.
However, if you customize the __eq__ method to make objects equal based on their internal state (e.g., attribute values), you must also customize the __hash__ method to be based on that same state. Failing to do so breaks the hash contract.
class Person:
def __init__(self, name, ssn):
self.name = name
self.ssn = ssn # Social Security Number (assumed unique)
def __eq__(self, other):
# Define equality based on SSN
if isinstance(other, Person):
return self.ssn == other.ssn
return False
# This is WRONG. It uses the default id()-based hash.
# Two equal Person objects (same SSN) will likely have different hashes.
# Correct implementation:
class CorrectPerson:
def __init__(self, name, ssn):
self.name = name
self.ssn = ssn
def __eq__(self, other):
if isinstance(other, CorrectPerson):
return self.ssn == other.ssn
return False
def __hash__(self):
# The hash must be derived from the SSN, the same attribute used in __eq__
return hash(self.ssn)
# Usage
p1_wrong = Person('Alice', '123-45-6789')
p2_wrong = Person('A. Smith', '123-45-6789')
print(p1_wrong == p2_wrong) # True (they are equal based on SSN)
print(hash(p1_wrong) == hash(p2_wrong)) # False! This breaks the contract.
p1_correct = CorrectPerson('Alice', '123-45-6789')
p2_correct = CorrectPerson('A. Smith', '123-45-6789')
print(p1_correct == p2_correct) # True
print(hash(p1_correct) == hash(p2_correct)) # True. Contract is upheld.
# Now it can be safely used in a set
person_set = {p1_correct, p2_correct}
print(len(person_set)) # Output: 1 (only one unique SSN, so only one element)
Making a Custom Class Unhashable
If your class is mutable and you have defined a custom __eq__ method, you should explicitly mark it as unhashable to prevent its use in sets and avoid logical errors. This is done by setting __hash__ = None in the class definition.
class MutableEmployee:
def __init__(self, name, salary):
self.name = name
self.salary = salary # This can change!
def __eq__(self, other):
if isinstance(other, MutableEmployee):
return self.name == other.name
return False
__hash__ = None # Explicitly make it unhashable
e1 = MutableEmployee('Bob', 50000)
try:
employee_set = {e1}
except TypeError as e:
print(f"Error: {e}") # Error: unhashable type: 'MutableEmployee'
Best Practices and Common Pitfalls
- Immutability is Key: The safest way to ensure an object remains hashable is to make it immutable. If its internal state cannot change, its hash value is guaranteed to remain constant. This is why
frozensetexists whensetdoes not. - Never Use Mutable Values in Hash Calculation: If your custom
__hash__method relies on a mutable attribute, changing that attribute will change the object’s hash. If the object is already stored in a set or dictionary, its hash value used for storage is now invalid, making the object impossible to find and corrupting the data structure. This is a severe error. - Hash Should Be an Integer: The
__hash__method must return an integer. This integer is used to calculate the bucket index in the hash table. - Consistency is Paramount: Always ensure that the attributes used in
__eq__are the exact same ones used to calculate the hash in__hash__. Any inconsistency will violate the contract and lead to unpredictable and difficult-to-debug behavior in sets and dictionaries.