64.8 Hypothesis: Property-Based Testing and Shrinking Failures

Alright, let’s talk about Hypothesis. You’ve probably been writing unit tests where you, the brilliant and overworked developer, have to dream up every single weird edge case yourself. You’re the one thinking, “What if the list is empty? What if the integer is negative? What if the string has emojis in it?” It’s exhausting, and frankly, it’s a job for a machine. That’s where Hypothesis comes in.

Think of Hypothesis as your incredibly diligent, slightly obsessive-compulsive intern. You give it the shape of the data you want to test—integers, lists of strings, custom objects—and it goes off and generates hundreds of random examples, trying to break your code. But it’s not just random; it’s strategically random. It’s actively trying to find the smallest, most embarrassing example that will make your function vomit. This is called property-based testing. Instead of testing specific examples (test_add(2, 2)), you test general properties (for all pairs of integers a, b, add(a, b) should equal add(b, a)).

The Core Mechanics: `@given` and Strategies

You start by defining a strategy for your data. This tells Hypothesis how to generate the random data. Then you decorate your test function with @given. Hypothesis will then run your test function a bunch of times (100 by default) with different generated values.

import hypothesis.strategies as st
from hypothesis import given

# A simple property: reversing a list twice gives you the original list.
@given(st.lists(st.integers()))  # The strategy: generate lists of integers
def test_list_reversal_twice_is_original(lst):
    original_list = lst.copy()  # Let's be safe, not sorry
    reversed_once = list(reversed(original_list))
    reversed_twice = list(reversed(reversed_once))
    assert reversed_twice == original_list

Run this. It’ll probably pass. But watch what happens if we write a buggy function.

def buggy_reverse(lst):
    """A 'reversal' function that accidentally breaks on empty lists."""
    if not lst:
        return [None]  # Whoops! Clearly wrong.
    return list(reversed(lst))

@given(st.lists(st.integers()))
def test_buggy_reversal(lst):
    assert buggy_reverse(buggy_reverse(lst)) == lst

Hypothesis will find the failure. And it won’t just say “failed with []”. It will perform its second magic trick: shrinking.

Shrinking: Finding the Minimal Reproducible Example

Shrinking is Hypothesis’s killer feature. Once it finds a random example that breaks your code (e.g., [0, 0, 0, -1, 0]), it doesn’t just report that. It tries to simplify that example. Can it break it with a list of two elements? One element? An empty list?

It systematically tries to remove or reduce parts of the failing example until it finds the smallest, most fundamental case that still causes the failure. The error report for our test above will be gloriously blunt:

Falsifying example: test_buggy_reversal(
    lst=[],
)

It didn’t just find a counterexample; it found the simplest possible counterexample: the empty list. This is invaluable. You’re not debugging a bizarre, complex data structure; you’re debugging one obvious, clear edge case you stupidly forgot. It’s like a friend who not only tells you you have spinach in your teeth but also hands you a mirror.

Common Pitfalls and How to Avoid Them

Pitfall 1: Assuming Healthier Inputs. Your strategies define the entire domain of inputs. st.integers() includes negative numbers, zero, and very large integers. If your function only works on positive integers, you must say so! Don’t test for “all integers” and then be surprised when a negative number breaks it. Use st.integers(min_value=1).

Pitfall 2: Ignoring Slow Tests. A Hypothesis test runs your function 100 times. If your function is slow, your test suite will grind to a halt. Use the settings decorator to reduce the number of examples for slow tests, or better yet, figure out why your function is so slow.

from hypothesis import settings, HealthCheck

@settings(max_examples=20, suppress_health_check=[HealthCheck.function_scoped_fixture])
@given(st.lists(st.integers()))
def test_slow_function(lst):
    # ... some slow operation
    pass

Pitfall 3: Writing Imprecise Properties. The hardest part is defining the right property to test. “This function shouldn’t crash” is a good start, but “the output should have this specific mathematical relationship to the input” is far more powerful. Instead of testing that your serializer works, test that serializing and then deserializing a object returns an equivalent object (a round-trip property).

When Shrinking Itself Fails (The Rare Stuff)

Very occasionally, you’ll see a “shrinking” failure. This doesn’t mean your code is wrong; it means Hypothesis got confused while trying to simplify the example. This usually happens when your test has a side effect that changes between the initial failing example and the simpler ones Hypothesis tries.

For instance, if your test mutates a global variable, the initial run might break in a complex way, but the simpler run might break differently because the global state is altered. Hypothesis throws its hands up and says, “I found a failure with [100, 200], but when I tried the simpler case [0], it didn’t fail anymore! I don’t know what to tell you!”

The fix is always the same: write pure functions for your tests. No global state, no mutable fixtures inside the test, no I/O. A Hypothesis test should be a perfect little sealed universe where the only thing that changes is the arguments passed in by the @given decorator. This is good practice anyway, but Hypothesis will ruthlessly enforce it.

So, use it. Let the machine do the boring work of generating edge cases. You focus on defining the invariants—the fundamental truths of your code that must always hold. It will make you a better, lazier, and more correct programmer.

The Core Mechanics: @given and Strategies

Shrinking: Finding the Minimal Reproducible Example

Common Pitfalls and How to Avoid Them

When Shrinking Itself Fails (The Rare Stuff)

The Core Mechanics: `@given` and Strategies