33.9 Pydantic: Validation-First Data Classes
While Python’s dataclass module excels at reducing boilerplate, it lacks built-in mechanisms for data validation. This is where Pydantic shines. Pydantic is a validation-first data parsing and settings management library that enforces type hints at runtime. It is fundamentally designed around the principle that data should be validated and transformed into the expected shape as it enters your system, ensuring that your core business logic operates on known-good data. This “parse, don’t validate” approach drastically increases code robustness and reduces defensive programming overhead.
Core Concepts and Basic Usage
At its heart, a Pydantic model is a class that inherits from pydantic.BaseModel. You define the expected data structure using standard Python type hints. When you instantiate a model, Pydantic automatically validates the provided data, converts it to the specified types where possible, and stores it.
from pydantic import BaseModel, ValidationError
class User(BaseModel):
id: int
name: str
email: str
signup_ts: datetime | None = None # Optional field with default
friends: list[int] = [] # Field with a default value
# Valid data is parsed and validated
user_data = {
"id": "123", # Pydantic will try to coerce this string to an int
"name": "Alice",
"email": "alice@example.com",
"signup_ts": "2023-10-27T12:00:00"
}
user = User(**user_data)
print(user.id) # Output: 123 (now an integer)
print(type(user.id)) # Output: <class 'int'>
# Invalid data raises a detailed ValidationError
try:
User(id="not_an_int", name="Bob", email="invalid")
except ValidationError as e:
print(e.json(indent=2))
Field Customization and Advanced Validation
Pydantic’s Field function allows for extensive customization of individual model fields. You can add descriptions, enforce value constraints, and use custom validators far beyond simple type checking.
from pydantic import BaseModel, Field, field_validator
import re
class Product(BaseModel):
name: str = Field(..., min_length=1, max_length=50) # Ellipsis (...) means required
price: float = Field(gt=0, description="Price must be positive")
sku: str = Field(pattern=r'^[A-Z]{3}-\d{3}$') # Regex pattern validation
discount_code: str | None = Field(default=None, max_length=10)
@field_validator('name')
@classmethod
def name_must_contain_space(cls, v):
if ' ' not in v:
raise ValueError('must contain a space')
return v.title() # Validators can also transform data
# This will fail validation due to the custom validator and regex pattern
try:
p = Product(name="invalid", price=10.99, sku="ABC-123")
except ValidationError as e:
print(e)
Model Configuration and Behavior
The model_config attribute, often configured via the ConfigDict class, controls the overall behavior of a Pundantic model. This is a powerful mechanism for defining how the model handles extra fields, whether it should be immutable, and if validation should be strict.
from pydantic import BaseModel, ConfigDict
class StrictUser(BaseModel):
model_config = ConfigDict(
strict=True, # Disallows coercion (e.g., str "123" to int 123)
frozen=True, # Makes the model immutable after creation
extra='forbid' # Raises an error if extra fields are provided
)
id: int
name: str
# This will fail because of 'strict' mode (no string-to-int coercion)
try:
u = StrictUser(id="123", name="Alice")
except ValidationError as e:
print("Strict validation failed:", e)
# This will fail because of 'extra=forbid'
try:
u = StrictUser(id=123, name="Alice", age=30)
except ValidationError as e:
print("Extra fields forbidden:", e)
Integration with Data Classes and JSON
Pydantic integrates seamlessly with other Python paradigms. You can create a Pydantic model from a dataclass to add validation to an existing structure. Furthermore, Pydantic models have powerful built-in methods for serialization to and from JSON.
from dataclasses import dataclass
from pydantic import BaseModel
@dataclass
class DataClassPoint:
x: int
y: int
# Create a Pydantic model from the dataclass
class ValidatedPoint(BaseModel):
model_config = ConfigDict(arbitrary_types_allowed=True)
point: DataClassPoint
# Now validation is applied to the entire dataclass contained within
p = ValidatedPoint(point=DataClassPoint(x='1', y=2))
print(p.point.x, type(p.point.x)) # Output: 1 <class 'int'>
# Easy JSON serialization/deserialization
json_data = p.model_dump_json()
print(json_data) # Output: {"point": {"x": 1, "y": 2}}
reconstructed_point = ValidatedPoint.model_validate_json(json_data)
Common Pitfalls and Best Practices
A common pitfall is confusing Pydantic’s validation for a full replacement of business logic validation. It is best used for input shaping and “syntax-level” validation (correct types, formats, and simple constraints). More complex business rules (e.g., “a user’s discount cannot be applied to this product category”) should still reside in your application’s service layer.
Another best practice is to leverage Pydantic’s settings management for application configuration. The Settings class can automatically read from environment variables, .env files, and secrets managers, providing a validated and type-safe configuration object for your entire application. Always use extra='forbid' in your settings models to prevent silent errors from typos in environment variable names.
When performance is critical, be aware that Pydantic’s validation adds overhead compared to a raw dictionary or a simple dataclass. For internal, already-validated data paths, consider converting a validated Pydantic model to a lighter-weight dataclass or a tuple for processing.