33.1 @dataclass: Automatic __init__, __repr__, and __eq__
The @dataclass decorator, introduced in Python 3.7 (via PEP 557), is a powerful tool for automatically generating common special methods for classes that primarily store data. It significantly reduces the boilerplate code required to write robust, feature-rich classes. At its core, a data class is a regular Python class, but the decorator synthesizes the __init__, __repr__, and __eq__ methods based on the class attributes you define.
Defining a Basic Data Class
To create a data class, you apply the @dataclass decorator and define your class attributes using type annotations. These annotations are crucial; the decorator uses them to identify which fields should be included in the automatically generated methods.
from dataclasses import dataclass
@dataclass
class InventoryItem:
name: str
unit_price: float
quantity_on_hand: int = 0 # Default value
# __init__ is automatically created
item = InventoryItem("Widget", 2.99, 50)
# __repr__ is automatically created
print(item) # Output: InventoryItem(name='Widget', unit_price=2.99, quantity_on_hand=50)
# __eq__ is automatically created
item2 = InventoryItem("Widget", 2.99, 50)
print(item == item2) # Output: True
The __init__ method parameters are ordered based on the order of the field definitions in the class body. Fields with default values must come after fields without them, mirroring the same rule for function parameters.
Field Customization with field()
For more control over the behavior of individual fields, you use the dataclasses.field() function. This is necessary for several advanced use cases.
from dataclasses import dataclass, field
from typing import List
@dataclass
class Customer:
name: str
# Use default_factory for mutable defaults to avoid shared state
orders: List[str] = field(default_factory=list)
# Use repr=False to exclude a field from the generated __repr__
internal_id: int = field(default=123, repr=False)
# Use compare=False to exclude a field from the generated __eq__
metadata: dict = field(default_factory=dict, compare=False)
cust = Customer("Alice")
cust.orders.append("Order123")
print(cust) # Output: Customer(name='Alice', orders=['Order123'])
The default_factory parameter is essential for providing mutable default values (like lists or dictionaries). Providing a mutable default directly (e.g., orders: List[str] = []) is a major pitfall because it creates a single, shared list instance across all instances of the class that don’t provide their own orders argument. The default_factory ensures a new, separate mutable object is created for each new instance.
Immutability and Hashing with frozen=True
A common requirement for data objects is immutability. Setting frozen=True on the @dataclass decorator makes instances read-only after creation and enables the automatic generation of a __hash__ method, allowing instances to be used as keys in dictionaries or elements in sets.
@dataclass(frozen=True)
class ImmutablePoint:
x: int
y: int
point = ImmutablePoint(10, 20)
print(point.x, point.y) # Output: 10 20
# This will raise a FrozenInstanceError
# point.x = 15
# __hash__ is implemented because frozen=True
print(hash(point)) # Outputs an integer hash value
# Can be used as a dictionary key
points_dict = {point: "origin"}
It’s important to note that if frozen=True is set, attempting to assign a value to any field will raise a dataclasses.FrozenInstanceError. For a class to be hashable, it must not only be immutable (frozen) but all of its fields must also be hashable. The generated __hash__ method will use the hash of a tuple of all field values.
Post-Initialization Processing with __post_init__
The automatically generated __init__ method is excellent for basic initialization, but sometimes you need to perform validation or derive the value of a field based on other fields. This is achieved by defining a __post_init__ method, which is automatically called by the generated __init__ after all fields have been assigned.
@dataclass
class Rectangle:
width: float
height: float
area: float = field(init=False) # A field that is not a parameter in __init__
def __post_init__(self):
# Validate input
if self.width <= 0 or self.height <= 0:
raise ValueError("Width and height must be positive.")
# Calculate a derived field
self.area = self.width * self.height
rect = Rectangle(5.0, 4.0)
print(rect) # Output: Rectangle(width=5.0, height=4.0, area=20.0)
# This will raise a ValueError
# invalid_rect = Rectangle(-1, 10)
Fields defined with init=False are not included as parameters in __init__ but are still included in __repr__ and __eq__. The __post_init__ method is the perfect place to initialize such fields.
Inheritance and Order of Fields
Data classes support inheritance. The fields of a parent data class are added before the fields of the child data class. This order determines the order of parameters in the generated __init__ method.
@dataclass
class Person:
name: str
age: int
@dataclass
class Employee(Person):
employee_id: str
department: str
# The __init__ signature is: (name: str, age: int, employee_id: str, department: str)
emp = Employee("Bob", 40, "E123", "Engineering")
print(emp) # Output: Employee(name='Bob', age=40, employee_id='E123', department='Engineering')
A key pitfall to avoid is redefining a field from a parent class with a default value in a child class. Since the child’s fields are added after the parent’s, this violates the rule that fields without defaults cannot follow fields with defaults. To resolve this, you must redefine all parent fields in the child class, providing defaults where necessary.