36.7 functools.cached_property: Lazy Computed Attributes

The functools.cached_property decorator, introduced in Python 3.8, provides a powerful and elegant mechanism for creating lazy, computed attributes on classes. It is designed for instance attributes that are expensive to compute and should be cached for the lifetime of the instance. Unlike @property, which recalculates its value on every access, a @cached_property computes its value only once, upon first access, and then stores the result on the instance itself, making subsequent accesses return the cached value with minimal overhead.

The core mechanism of cached_property involves a subtle but clever trick: it is a non-data descriptor. In Python’s descriptor protocol, a non-data descriptor only defines a __get__ method. When such a descriptor is accessed on an instance, Python calls its __get__ method. This is precisely what cached_property does. Its __get__ method performs the computation, stores the result directly on the instance using setattr, and then returns the computed value. Crucially, by storing the result on the instance under the same name as the property, it “shadows” the descriptor. This means all subsequent accesses bypass the descriptor entirely and go straight to the instance’s __dict__, retrieving the cached value directly. This is why the cache is so efficient after the first access.

Basic Usage and Mechanics

The most common use case is for attributes derived from other instance data. Consider a Dataset class that loads raw data; computing statistics on that data might be expensive and is a prime candidate for caching.

from functools import cached_property

class Dataset:
    def __init__(self, data):
        self._raw_data = data  # Assume this is a large, expensive-to-process list

    @cached_property
    def stats(self):
        print("Computing statistics (this only happens once)...")
        # Simulate an expensive computation
        computed_min = min(self._raw_data)
        computed_max = max(self._raw_data)
        computed_mean = sum(self._raw_data) / len(self._raw_data)
        return {
            'min': computed_min,
            'max': computed_max,
            'mean': computed_mean
        }

# Usage
data_instance = Dataset([1, 2, 3, 4, 5])
print(data_instance.stats)  # Prints message and returns computed dict
print(data_instance.stats)  # No message; returns cached dict instantly

Interaction with Mutability and Instance Deletion

Because the cached value is stored directly on the instance, it behaves like any other instance attribute. This has important implications. If you mutate the object that the cached_property returns, the changes will persist, as you are now working with the stored object, not a new computation.

# Continuing from the previous example
my_stats = data_instance.stats
my_stats['new_key'] = 'modified'  # This mutates the cached dictionary!
print(data_instance.stats)  # Output will include 'new_key': 'modified'

Furthermore, if you delete the attribute from the instance’s __dict__, the cache is effectively cleared. The next access will trigger a recomputation because the descriptor is no longer shadowed and its __get__ method will be called again.

del data_instance.__dict__['stats']  # Manually clear the cache
print(data_instance.stats)  # Prints "Computing statistics..." again

Thread-Safety Considerations

The standard cached_property is not inherently thread-safe. In a multi-threaded environment, it’s possible for the __get__ method to be called multiple times by different threads before the computed value is stored on the instance. While only one result will ultimately be cached (the last one to call setattr), this means the expensive computation could be performed redundantly, wasting resources.

To mitigate this, you must implement your own locking mechanism around the computation within the method. The cached_property decorator itself does not handle this.

import threading

class ThreadSafeDataset:
    def __init__(self, data):
        self._raw_data = data
        self._stats_lock = threading.Lock()

    @cached_property
    def stats(self):
        with self._stats_lock:  # Ensure only one thread computes at a time
            # Check if another thread already computed it while we waited for the lock
            if hasattr(self, 'stats'):
                return self.__dict__['stats']
            print("Computing statistics in a thread-safe way...")
            computed_min = min(self._raw_data)
            computed_max = max(self._raw_data)
            return {'min': computed_min, 'max': computed_max}

# Note: This pattern is a bit complex. For serious applications,
# consider a dedicated caching library.

Key Differences from property and lru_cache

It’s crucial to understand when to use cached_property over its cousins. A standard @property recalculates on every access, which is safe for mutable data but inefficient for expensive, idempotent operations. @functools.lru_cache is a function decorator that creates a cache across multiple instances based on arguments. @cached_property is for caching a value specific to a single instance with no arguments.

from functools import lru_cache

class UsingLruCache:
    def __init__(self, data):
        self.data = data

    @lru_cache(maxsize=None)
    def get_stats(self):  # This is a method, not a property
        print("lru_cache computation...")
        return {'len': len(self.data)}

obj1 = UsingLruCache([1,2,3])
obj2 = UsingLruCache([4,5,6])
obj1.get_stats()  # Prints message
obj1.get_stats()  # No message (cached for obj1's `self`)
obj2.get_stats()  # Prints message again! lru_cache distinguishes instances by `self`

The lru_cache approach works but is less intuitive and slightly less efficient for instance-specific caching than cached_property, which is purpose-built for this scenario.