31.1 The Descriptor Protocol: __get__, __set__, __delete__, __set_name__

Descriptors are a core mechanism in Python’s object system, enabling the powerful property, method, and class method functionality we often take for granted. They provide a protocol for overriding default attribute access behavior (getting, setting, and deleting) on a per-attribute basis. Any object that defines at least one of the methods __get__, __set__, or __delete__ is considered a descriptor. This protocol is the foundational machinery that makes the @property decorator work.

The Three Core Methods

The descriptor protocol consists of three primary methods, which are invoked automatically by Python’s internal machinery when an attribute defined as a descriptor is accessed on an instance.

__get__(self, instance, owner=None): This method is called when the descriptor’s value is retrieved (e.g., obj.attr). The instance parameter is the instance from which the attribute is being accessed. The owner parameter is the class to which the instance belongs. If the attribute is accessed on the class itself (e.g., Class.attr), the instance argument is None. This distinction is crucial for defining behavior that differs between instance and class-level access.

__set__(self, instance, value): This method is called when the descriptor’s value is assigned to (e.g., obj.attr = value). It does not return a value. A descriptor that defines __set__ (or __delete__) is called a data descriptor. Data descriptors take precedence over the instance’s __dict__ during the attribute lookup process.

__delete__(self, instance): This method is called when the descriptor’s value is deleted via the del statement (e.g., del obj.attr). Like __set__, it does not return a value.

class VerboseDescriptor:
    """A simple descriptor that prints every access."""
    def __init__(self, initial_value=None):
        self.value = initial_value

    def __get__(self, instance, owner):
        print(f"__get__ called. Instance: {instance}, Owner: {owner}")
        return self.value

    def __set__(self, instance, value):
        print(f"__set__ called. Instance: {instance}, Value: {value}")
        self.value = value

    def __delete__(self, instance):
        print(f"__delete__ called. Instance: {instance}")
        del self.value

class MyClass:
    attr = VerboseDescriptor("initial")

# Instance access invokes __get__
obj = MyClass()
x = obj.attr  # Output: __get__ called. Instance: <__main__.MyClass object...>, Owner: <class '__main__.MyClass'>

# Assignment invokes __set__
obj.attr = "new value"  # Output: __set__ called. Instance: <__main__.MyClass object...>, Value: new value

# Class access also invokes __get__, with instance=None
y = MyClass.attr  # Output: __get__ called. Instance: None, Owner: <class '__main__.MyClass'>

Data vs. Non-Data Descriptors

A critical distinction in the descriptor protocol is between data and non-data descriptors. This distinction directly controls the attribute lookup order, which is defined by the C3 algorithm.

A data descriptor defines __set__ and/or __delete__. Because it can control writing and deletion, it is given the highest priority in the lookup order. If a data descriptor exists on a class, it will be invoked even if the instance has an attribute of the same name in its __dict__. This is why a property (@property) can effectively override an instance variable.

A non-data descriptor defines only __get__. Methods (defined in a class) are the most common non-data descriptors. They have a lower lookup priority. If an instance has an attribute of the same name in its __dict__, that instance attribute will shadow the non-data descriptor. This is why you can overwrite a method on a specific instance by simply assigning to it.

class DataDescriptor:
    def __get__(self, instance, owner):
        return "Data descriptor value"

    def __set__(self, instance, value):
        pass  # Even an empty __set__ makes this a data descriptor

class NonDataDescriptor:
    def __get__(self, instance, owner):
        return "Non-data descriptor value"

class TestClass:
    data_attr = DataDescriptor()
    nondata_attr = NonDataDescriptor()

obj = TestClass()

# Instance __dict__ is shadowed by data descriptor
obj.__dict__['data_attr'] = 'instance value'
obj.__dict__['nondata_attr'] = 'instance value'

print(obj.data_attr)    # Output: Data descriptor value (data descriptor wins)
print(obj.nondata_attr) # Output: instance value (instance __dict__ wins)

The `__set_name__` Method

Introduced in Python 3.6, __set_name__(self, owner, name) is called automatically when the descriptor is assigned to a name within a new class definition (the owner). The name argument is the name of the attribute to which the descriptor was assigned. This method solves a major pitfall of earlier descriptors: the need to hardcode the attribute name within the descriptor itself.

Before __set_name__, a descriptor often had to be told its own name during __init__, which was error-prone and less elegant. Now, the descriptor can know its name and use it, for example, to store data in the instance’s __dict__ under a unique, private key, preventing naming collisions.

class ValidatedAttribute:
    def __init__(self):
        # We don't need a name parameter anymore!
        self.private_name = None

    def __set_name__(self, owner, name):
        # This is called when the class 'owner' is created
        self.public_name = name
        self.private_name = '_' + name  # Create a private storage name

    def __get__(self, instance, owner):
        if instance is None:
            return self
        return getattr(instance, self.private_name, None)

    def __set__(self, instance, value):
        # Example validation
        if not isinstance(value, str):
            raise TypeError(f"{self.public_name} must be a string")
        setattr(instance, self.private_name, value)

class Person:
    name = ValidatedAttribute()  # __set_name__ is called here with name="name"

p = Person()
p.name = "Alice"  # Works
print(p.name)     # Output: Alice

# p.name = 123    # Would raise TypeError: name must be a string

Common Pitfalls and Best Practices

Infinite Recursion: The most common pitfall is accidentally causing infinite recursion inside a descriptor method. This happens if you access the descriptor’s own name on the instance without a safe guard.

# WRONG: Causes infinite recursion
class BadDescriptor:
    def __get__(self, instance, owner):
        return instance.attr  # This call to 'attr' triggers __get__ again!

# RIGHT: Use __dict__ or __set_name__ pattern
class GoodDescriptor:
    def __set_name__(self, owner, name):
        self.name = name
    def __get__(self, instance, owner):
        return instance.__dict__[self.name] # Bypass the descriptor protocol

Storage in the Descriptor: In the VerboseDescriptor example, the value is stored on the descriptor instance itself (self.value). This means the value is shared across all instances of the class that owns the descriptor. This is rarely the desired behavior. The __set_name__ pattern, storing data in the instance’s __dict__, is the standard solution for instance-specific storage.
Descriptor Lifetime: The descriptor object itself is a class attribute. It is created when the class is defined and lives for the lifetime of the class. The instances it manages are separate and have their own lifetimes. This is why you cannot store instance-specific data on the descriptor object.
Use __slots__ Carefully: If you use __slots__ in your class to prevent the creation of __dict__, you must ensure the slot names do not conflict with the private names your descriptor uses for storage. The descriptor will need to use an alternative storage strategy.

The Three Core Methods

Data vs. Non-Data Descriptors

The __set_name__ Method

Common Pitfalls and Best Practices

The `__set_name__` Method