85.3 NumPy/SciPy Style Docstrings
Right, let’s talk about making your code talk. You’ve written a function. It’s brilliant. It’s clever. It works. But six months from now, you, or worse, someone else, will look at it and have absolutely no idea what it does or why. This is where docstrings come in. They’re not just comments for the sake of it; they’re the user manual for your function, and when done right, they become the very documentation for your entire project. We’re focusing on the NumPy/SciPy style here because, frankly, it’s the gold standard for scientific Python. It’s comprehensive, readable by humans and machines, and it doesn’t make you want to gouge your eyes out.
The Basic Anatomy of a NumPy Docstring
A good docstring isn’t a novel; it’s structured data. Here’s the skeleton you’ll be fleshing out. Notice the sections. They’re not arbitrary; they answer specific questions a user will have.
def calculate_velocity(initial_velocity, acceleration, time, method='euler'):
"""
Calculate the final velocity of an object under constant acceleration.
This function implements the basic equations of motion to compute velocity.
It supports different numerical methods for educational purposes, though
the analytic solution is straightforward.
Parameters
----------
initial_velocity : float
The starting velocity of the object in meters per second (m/s).
Can be positive or negative to indicate direction.
acceleration : float
The constant acceleration applied to the object in m/s².
time : float
The duration over which the acceleration is applied in seconds (s).
Must be a non-negative value.
method : {'euler', 'analytic'}, optional
The method used for calculation. Default is 'euler'.
- 'euler': Uses a simple forward Euler integration. (Spoiler: it's bad for this.)
- 'analytic': Uses the exact equation `v = u + a*t`.
Returns
-------
float
The final velocity in m/s.
Raises
------
ValueError
If `time` is negative or if `method` is not one of the allowed values.
Notes
-----
The 'euler' method is included purely to demonstrate the perils of using
the wrong numerical tool for a job that has a perfect analytic solution.
Please use 'analytic' in production code. I'm serious.
Examples
--------
>>> calculate_velocity(10, 2, 5)
20.0
>>> calculate_velocity(10, 2, 5, method='analytic')
20.0
"""
if time < 0:
raise ValueError("Time cannot be negative. We're not building a time machine.")
if method == 'analytic':
return initial_velocity + acceleration * time
elif method == 'euler':
# This is pedagogically terrible and I hate it.
velocity = initial_velocity
for _ in range(int(time)):
velocity += acceleration * 1 # Integrating over 1s steps. Yikes.
return velocity
else:
raise ValueError(f"Method '{method}' is not supported. Choose 'euler' or 'analytic'.")
Parameters: The Heart of the Matter
This is the most important section. Get this wrong and your function is a mystery box. The format is strict for a reason: tools like Sphinx parse this to auto-generate beautiful docs.
- Type Hints are Your Friend: Notice I use
floatand the{'euler', 'analytic'}set. Be specific. If it can beintorfloat, sayint | floatornumeric. Modern Python lets you use actual type hints in the function signature (def func(param: type) -> return_type), which is fantastic. Your docstring should reinforce these types, not contradict them. - Describe the What and the Why: Don’t just say
initial_velocity : float. Tell me what it represents (“starting velocity… in m/s”) and any constraints (“Can be positive or negative”). - Flag Optional Parameters: The word
optionalnext to the type is a clear signal. The default value should be obvious from the signature and repeated here.
Returns and Raises: No Surprises
Users need to know what they’re getting back and what can go wrong. The Raises section is a contract. If you document that your function raises a ValueError for bad input, you must make sure it does. This is how you build robust, trustworthy code. Don’t hide the edge cases; advertise them.
Examples: The Proof is in the Pudding
This isn’t just helpful; it’s testable. Tools like doctest can literally run these examples to make sure your documentation isn’t lying. It’s the ultimate “working as intended” sign. Make them concise, but make them demonstrate both common and edge-case usage.
Common Pitfalls and The “Why”
- Inconsistency: The fastest way to make your project look amateurish is to have six different docstring styles. Pick one (this one) and stick to it across the entire codebase. Use a linter like
pydocstylewith thenumpyconvention to enforce it automatically. - Lazy Parameter Descriptions:
data : arraytells me nothing. Is it a NumPy array? A list? What’s in it? Floats? Integers? What does each row represent?data : np.ndarrayis better.data : np.ndarray of shape (n_samples, n_features)is best. - Forgetting the “Why”: The
Notessection in our example is crucial. It explains the seemingly insane inclusion of the ’euler’ method. Without that note, I’d think you were a terrible engineer. With it, I understand the didactic purpose. Always explain the non-obvious reasoning behind a design choice, especially a questionable one. It shows you’ve actually thought about it. - Over-Documenting Trivia: You don’t need a docstring for a property that returns
self.name. If the function’s purpose is blatantly obvious from its name and parameters, a one-line description is perfectly sufficient. The NumPy style shines for complex, scientific functions, not necessarily for every single getter method. Use your judgment.