In Python, the __hash__ method is a fundamental part of the language’s data model, enabling an object to be used as a key in a dictionary or as a member of a set. These data structures rely on a hash table implementation, which requires a fast, efficient way to compute a unique integer representation—a hash value—for each object. This hash value acts as a rough guide to where the object’s data is stored, allowing for near-constant time (O(1)) average complexity for lookups, insertions, and deletions.

The Immutable Contract and hash Implementation

A critical rule governs the implementation of __hash__: if a class defines __eq__ to customize equality comparison, it must also define __hash__, but only if the objects are intended to be immutable. Mutable objects should generally not be hashable. This is because the hash value of an object, once used as a key in a dictionary, must never change. If it does, the dictionary’s internal hash table becomes corrupted. The object’s key would be located in the wrong bucket, making it impossible to find again. For this reason, built-in mutable types like list, dict, and set are unhashable and cannot be used as keys.

To define a hash function for a custom immutable class, you typically combine the hash values of the object’s components that are part of its equality comparison. The recommended way is to use a tuple of these attributes and compute the hash of that tuple. This approach correctly handles the intricacies of hashing.

class ImmutablePoint:
    def __init__(self, x, y):
        self._x = x
        self._y = y

    @property
    def x(self):
        return self._x

    @property
    def y(self):
        return self._y

    def __eq__(self, other):
        if not isinstance(other, ImmutablePoint):
            return NotImplemented
        return (self.x, self.y) == (other.x, other.y)

    def __hash__(self):
        # The hash is derived from the same attributes used in __eq__
        return hash((self.x, self.y))

    def __repr__(self):
        return f"ImmutablePoint({self.x}, {self.y})"

# Usage
p1 = ImmutablePoint(1, 2)
p2 = ImmutablePoint(1, 2)
p3 = ImmutablePoint(3, 4)

print(hash(p1)) # Outputs: -3550055125486541913 (or similar)
print(hash(p2)) # Outputs: -3550055125486541913 (same as p1)
print(hash(p3)) # Outputs: 1070750381123645703 (different)

# These can be used as keys because they are hashable and immutable.
points_dict = {p1: 'A', p3: 'B'}
print(points_dict[p2]) # Successfully outputs: 'A'

The relationship between __hash__ and __eq__ is not just a recommendation; it is an absolute requirement enforced by the Python data model. The core principle is: if two objects are considered equal by __eq__, they must have the same hash value. The reverse is not true—a hash collision, where two unequal objects share the same hash, is acceptable and is handled by the dictionary’s collision resolution logic. However, violating the primary rule (equal objects having different hashes) breaks the fundamental contract of hash-based collections and leads to undefined, erroneous behavior.

class BrokenPoint:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __eq__(self, other):
        if not isinstance(other, BrokenPoint):
            return NotImplemented
        return (self.x, self.y) == (other.x, other.y)

    def __hash__(self):
        # Incorrect: Hash is only based on x, but eq uses x and y.
        return hash(self.x)

# This creates a critical flaw.
b1 = BrokenPoint(1, 10)
b2 = BrokenPoint(1, 20)

print(b1 == b2) # False, they are not equal
print(hash(b1) == hash(b2)) # True, because both hashes are based only on 1

# The objects can be inserted into a set...
my_set = {b1, b2}
print(len(my_set)) # Outputs 2. This works by coincidence.

# But the flaw is exposed when trying to use them as dictionary keys.
my_dict = {}
my_dict[b1] = "Object b1"
# Since b2 has the same hash as b1, the dict may look in the wrong bucket and incorrectly
# believe it has found the key, leading to unpredictable behavior.

Best Practices and Common Pitfalls

  1. Immutability is Key: Only define __hash__ for objects whose significant attributes cannot change after creation. If you must make a mutable object hashable (strongly discouraged), ensure its hash value is computed from attributes that will never be modified.

  2. Use the Same Attributes: The hash calculation must be based on the exact same set of attributes that are used in the __eq__ method. Omitting or adding an attribute is a common source of subtle bugs.

  3. Leverage the hash() Function: Use the built-in hash() function on a tuple of your attributes. This is efficient and properly combines the individual hash values of the elements. Avoid crafting your own algorithm to combine hashes.

  4. Return an Integer: The __hash__ method must return an integer. Using hash() on a tuple guarantees this.

  5. Don’t Define hash for Mutable Classes: If your class defines __eq__ but is mutable, you should explicitly set __hash__ = None. This explicitly marks the instances as unhashable, preventing their use in sets or as dict keys and protecting you from the severe bugs that mutable keys would cause. Python does this for built-in mutable types.

class MutableButUnhashable:
    def __init__(self, value):
        self.value = value

    def __eq__(self, other):
        if not isinstance(other, MutableButUnhashable):
            return NotImplemented
        return self.value == other.value

    # Explicitly making it unhashable is the correct choice.
    __hash__ = None

m1 = MutableButUnhashable(42)
# try:
#     my_set = {m1}
# except TypeError as e:
#     print(e) # Outputs: unhashable type: 'MutableButUnhashable'