In Python, descriptors are a powerful mechanism that underpins properties, methods, and class-level functionality. The distinction between data descriptors and non-data descriptors is fundamental to understanding attribute lookup precedence and is crucial for designing robust and predictable classes.

The Defining Difference: __set__ or __delete__

The official Python documentation defines a descriptor as any object that implements at least one of the three special methods: __get__(), __set__(), or __delete__(). This single criterion creates the primary classification:

  • A data descriptor implements __set__() or __delete__() (and often, but not necessarily, __get__()).
  • A non-data descriptor implements only __get__().

This distinction is not merely academic; the Python interpreter’s attribute lookup logic changes its behavior drastically based on it.

How Attribute Lookup Precedence Works

The famous “attribute lookup chain” is governed by the __getattribute__() method. When you write obj.attr, the following simplified sequence occurs inside obj.__getattribute__('attr'):

  1. Check the data descriptors in the class __dict__ (and its parents via MRO).
  2. If found, the data descriptor’s __get__() method is invoked and its result is returned. This step overrides any instance dictionary entry.
  3. If not found, check the object’s instance __dict__ for the attribute.
  4. If not found in the instance, check the non-data descriptors in the class __dict__.
  5. If not found, check the class __dict__ for a regular attribute.
  6. If still not found, the __getattr__() method is called (if defined).

The critical takeaway is that data descriptors have higher precedence than the instance dictionary, while non-data descriptors have lower precedence. This is why a property (a data descriptor) can override an instance variable, while a method (a non-data descriptor) can be shadowed by one.

Code Example: Illustrating the Precedence

Let’s create two simple descriptors and observe this behavior in action.

# A Data Descriptor (implements __set__)
class DataDescriptor:
    def __get__(self, obj, obj_type=None):
        print("DataDescriptor __get__")
        return "value from data descriptor"

    def __set__(self, obj, value):
        print(f"DataDescriptor __set__ with {value}")
        # Typically, you would store the value somewhere, e.g., on the instance

# A Non-Data Descriptor (implements only __get__)
class NonDataDescriptor:
    def __get__(self, obj, obj_type=None):
        print("NonDataDescriptor __get__")
        return "value from non-data descriptor"

class MyClass:
    data_attr = DataDescriptor()   # Data descriptor in class dict
    non_data_attr = NonDataDescriptor() # Non-data descriptor in class dict

# Create an instance and test precedence
obj = MyClass()

print("--- Setting an instance attribute that shadows a data descriptor ---")
obj.data_attr = "instance value"  # This triggers DataDescriptor.__set__
print(f"obj.__dict__: {obj.__dict__}")
print(f"obj.data_attr: {obj.data_attr}") # Data descriptor wins, instance dict is ignored

print("\n--- Setting an instance attribute that shadows a non-data descriptor ---")
obj.non_data_attr = "instance value" # This goes to the instance dict
print(f"obj.__dict__: {obj.__dict__}")
print(f"obj.non_data_attr: {obj.non_data_attr}") # Instance dict wins over non-data descriptor

del obj.non_data_attr # Delete the instance attribute
print(f"\nAfter deletion: {obj.non_data_attr}") # Now the non-data descriptor is accessible again

Output:

--- Setting an instance attribute that shadows a data descriptor ---
DataDescriptor __set__ with instance value
obj.__dict__: {}
obj.data_attr: DataDescriptor __get__
value from data descriptor

--- Setting an instance attribute that shadows a non-data descriptor ---
obj.__dict__: {'non_data_attr': 'instance value'}
obj.non_data_attr: instance value

After deletion: NonDataDescriptor __get__
value from non-data descriptor

Common Pitfalls and Best Practices

A common pitfall occurs when creating a non-data descriptor that is meant to store state. Since it can be easily overridden by an assignment to the instance dictionary, its state becomes unreliable.

class CachedNonDataDescriptor:
    """A non-data descriptor that caches its result. Flawed."""
    def __init__(self, func):
        self.func = func
        self.cache = {}  # This cache is at the descriptor level

    def __get__(self, obj, obj_type=None):
        if obj is None:
            return self
        print("Calculating expensive value...")
        value = self.func(obj)
        self.cache[id(obj)] = value  # Cache the value keyed by object id
        return value

class MyClass:
    @CachedNonDataDescriptor
    def expensive_method(self):
        return 42

obj = MyClass()
print(obj.expensive_method) # Calculates and caches
print(obj.expensive_method) # Returns from cache

obj.expensive_method = 100  # This assignment shadows the descriptor!
print(obj.expensive_method) # Simply returns 100 from the instance dict; cache is bypassed.

Output:

Calculating expensive value...
42
42
100

Best Practice: If your descriptor needs to manage or store state for an instance, it should almost always be a data descriptor. This protects its internal state from being overridden by accidental assignment to the instance dictionary. The property decorator is the most common example of this pattern. To fix the cache example, you would implement __set__ to prevent assignment or to manage the cache correctly.

Real-World Examples

  • property: The built-in property is a data descriptor. It implements __get__, __set__, and __delete__, allowing it to fully control access to an attribute and trump any instance variable with the same name.
  • Methods: Class methods (including instance methods, @classmethod, and @staticmethod) are implemented as non-data descriptors. This is why you can dynamically overwrite a method on a single instance (obj.method = lambda: ...) without affecting other instances of the same class. The function object in the class has a __get__ method that binds it to the instance (or class) when accessed.
  • functools.cached_property (Python 3.8+): This is a fascinating and intentional hybrid. It is designed as a non-data descriptor so that it can be overridden. Its job is to compute and cache a value on the instance itself (obj.__dict__[attrname] = value). Once the value is computed and stored in the instance dict, the instance dict takes precedence over the non-data descriptor, making subsequent accesses very fast. If you delete the cached value from the instance (del obj.attr), the descriptor’s __get__ runs again on the next access to recompute it.