The field() function is the primary mechanism for customizing the behavior of individual attributes within a dataclass. While the basic @dataclass decorator handles straightforward cases, field() provides the fine-grained control necessary for robust and complex data structures. Its purpose is to specify parameters that cannot be expressed by a simple type annotation alone, such as default values, factory functions for mutable defaults, and post-init processing instructions.

Providing Default Values

The most common use of field() is to provide a default value for an attribute. While you can assign a default directly in the class definition (attr: int = 42), using field() becomes essential when the default value is mutable or requires special handling.

from dataclasses import dataclass, field

@dataclass
class SimpleConfig:
    name: str = "default"
    timeout: int = field(default=60)  # Equivalent to `timeout: int = 60`
    retries: int = field(default=3)

Both methods are functionally identical for immutable defaults like integers and strings. The explicit field() call is often chosen for consistency and readability, especially when other field parameters are also needed.

Handling Mutable Defaults with default_factory

A critical pitfall in Python, not unique to dataclasses, is using a mutable object as a default value. A common mistake is to define a class with a default list:

@dataclass
class BadCart:
    items: list = []  # WARNING: This is an anti-pattern!

# This leads to shared state across all instances
cart1 = BadCart()
cart1.items.append('apple')
cart2 = BadCart()
print(cart2.items)  # Unexpectedly outputs ['apple']

The issue is that the empty list [] is created once at the time the class is defined. Every instance of BadCart will share this single list object. This is almost never the intended behavior.

The default_factory parameter of field() solves this. It accepts a zero-argument callable (like list, dict, or set) which is called to produce a new mutable default value each time an instance is created without that attribute specified.

@dataclasses import dataclass, field

@dataclass
class ShoppingCart:
    items: list = field(default_factory=list)
    discounts: set = field(default_factory=set)
    settings: dict = field(default_factory=dict)

# Now each instance gets its own empty list, set, and dict
cart1 = ShoppingCart()
cart1.items.append('apple')
cart2 = ShoppingCart()
print(cart2.items)  # Correctly outputs []

This is a best practice and a crucial safeguard against subtle bugs caused by unintended shared mutable state.

Controlling Comparison and Hashing with repr, compare, and hash

The field() function allows you to override the default behavior of the dataclass-generated methods on a per-field basis.

  • repr=True/False: Includes or excludes the field from the generated __repr__ string. Useful for hiding sensitive data like passwords or internal state.
  • compare=True/False: Includes or excludes the field from the generated equality comparison methods (__eq__). This is useful for fields that hold transient or derived state that shouldn’t affect object equality (e.g., a cache or a modification timestamp).
  • hash=True/False: For frozen dataclasses, this dictates whether the field is included in the calculation of the object’s hash. It should typically be set to False for fields that are mutable in spirit, even if the instance is frozen.
@dataclass(frozen=True)
class User:
    username: str
    password_hash: str = field(repr=False, compare=False)  # Don't show or compare
    session_id: int = field(compare=False, hash=False)     # Don't use for equality or hashing
    _cache: dict = field(default_factory=dict, repr=False, compare=False, hash=False)

user = User('jdoe', 'abc123', 999)
print(user)  # Output: User(username='jdoe', session_id=999)

Attaching Metadata with metadata

The metadata parameter allows you to attach an arbitrary dictionary of information to a field. This metadata is not used by the dataclass machinery itself but is invaluable for other parts of your application, such as serialization libraries, form generators, or validation frameworks. It provides a standard place to annotate fields with extra information.

from dataclasses import dataclass, field

@dataclass
class Product:
    name: str = field(metadata={'unit': 'chars', 'max_length': 100})
    price: float = field(metadata={'unit': 'USD', 'precision': 2})
    product_id: int = field(metadata={'description': 'Internal primary key'})

# The metadata can be accessed via the `__dataclass_fields__` attribute.
name_field = Product.__dataclass_fields__['name']
print(name_field.metadata)  # Output: {'unit': 'chars', 'max_length': 100}
print(name_field.metadata['max_length'])  # Output: 100

Using init, kw_only, and Advanced Control

Further control over the generated __init__ method is available:

  • init=True/False: If False, the field is not included as a parameter in the __init__ method. It must be initialized using default or default_factory. This is useful for fields that are calculated internally after instance creation.
  • kw_only=True: (Python 3.10+) If the entire dataclass is defined with kw_only=True, this marks the field so it must be specified as a keyword argument, not a positional argument. This helps create more robust APIs, especially when adding new fields to existing classes.
@dataclass
class Example:
    a: int
    b: int = field(init=False)  # Not in __init__
    c: int = field(default=8, kw_only=True)  # Keyword-only argument

    def __post_init__(self):
        self.b = self.a * 2  # Initialize a non-init field

# e = Example(10, 20)  # TypeError: __init__() takes 2 positional arguments but 3 were given
e = Example(10, c=20)  # Correct usage
print(e)  # Output: Example(a=10, b=20, c=20)