64.9 Mutation Testing with mutmut

Right, so you’ve written your tests. They pass. Your coverage report is a beautiful sea of green. You feel pretty good about yourself. And you should. But let me ask you a slightly uncomfortable question: are you sure your tests are actually testing anything meaningful? Or are they just well-trained pets that perform a cute trick when you run pytest, blissfully ignoring any actual logic errors in your code? This is where mutation testing comes in, and mutmut is the Python library that’ll happily crush your ego so you can build it back up stronger. The concept is deviously simple. A mutation testing tool, a “muter,” will make a small, breaking change to your source code—like changing a + to a -, or turning a True into a False. It then runs your test suite against this “mutated” code. If your tests fail, great! You’ve killed that mutant. It means your tests noticed the backstab. If your tests still pass, uh-oh. That mutant survived. Your tests have a blind spot.

64.8 Hypothesis: Property-Based Testing and Shrinking Failures

Alright, let’s talk about Hypothesis. You’ve probably been writing unit tests where you, the brilliant and overworked developer, have to dream up every single weird edge case yourself. You’re the one thinking, “What if the list is empty? What if the integer is negative? What if the string has emojis in it?” It’s exhausting, and frankly, it’s a job for a machine. That’s where Hypothesis comes in. Think of Hypothesis as your incredibly diligent, slightly obsessive-compulsive intern. You give it the shape of the data you want to test—integers, lists of strings, custom objects—and it goes off and generates hundreds of random examples, trying to break your code. But it’s not just random; it’s strategically random. It’s actively trying to find the smallest, most embarrassing example that will make your function vomit. This is called property-based testing. Instead of testing specific examples (test_add(2, 2)), you test general properties (for all pairs of integers a, b, add(a, b) should equal add(b, a)).

64.7 Branch Coverage vs Line Coverage

Right, so you’ve got your tests running. Green checkmarks. Feels good, doesn’t it? But let me ask you a question: are you sure you’ve tested all the little decision points in that code, or have you just been stroking your ego by running down the happy path? This is where coverage tools come in, and where most developers immediately get the wrong idea. The two metrics you’ll see most often are line coverage and branch coverage. They sound similar, but the difference is crucial and, frankly, where most testing efforts fall flat on their face.

64.6 coverage.py: Measuring What Is Tested

Right, let’s talk about coverage. You’ve written some tests. You’ve run them. They pass. You feel good. But a nagging question remains: did I actually test all the code I just wrote, or did I just run the happy path and call it a day? This is where coverage.py comes in—it’s the brutally honest friend who tells you there’s spinach in your teeth. It doesn’t care about your intentions; it just reports which lines of your code were executed while your tests were running.

64.5 Test-Driven Development: Red-Green-Refactor

Right, so you’ve heard the gospel of Test-Driven Development. You’ve seen the zealots preach about the “design tool” and the “safety net.” And you’re probably thinking, “That sounds nice, but my deadline is Friday.” I get it. Let’s cut through the dogma and talk about what TDD actually is: a fantastically productive way to write code if you use it as a disciplined feedback loop, not a religious artifact. The core rhythm is stupidly simple: Red, Green, Refactor. It’s the discipline that’s hard.

64.4 Asserting Call Counts and Arguments

Right, so you’ve mocked out a function. You’ve set it up to return a specific value. Your test passes. High fives all around. But wait—did your code under test actually call that mock? And if it did, how many times? With what arguments? This is where we move from just checking state to verifying behavior, and it’s a crucial step up in your testing game. Let’s be honest: if you’re not verifying these interactions, you’re only testing half the story, and the other half is probably hiding a bug.

64.3 spec= and create_autospec(): Safer Mocks

Right, so you’ve decided to use mocks. Good for you. It means you’re testing behavior, not just state, and that’s a sign of a mature test suite. But let’s be honest: the standard unittest.mock library gives you enough rope to hang yourself with, and then some. You can mock anything, anywhere, anytime. That’s not power; that’s a liability. Ever written a test that passes beautifully, only to have the production code explode because your mock was completely divorced from the reality of the function it was pretending to be? I have. It’s a special kind of humiliation.

64.2 patch as Decorator and Context Manager

Now, let’s talk about patch. It’s arguably the most important tool in the mocking toolbox, and the Python developers, in a rare moment of clarity, gave it two incredibly useful interfaces: a decorator and a context manager. This isn’t just syntactic sugar; it’s a fundamental shift in how you control the scope of your lies, and you should understand both. The core concept is simple: patch finds the name of an object in a given module and replaces it with a MagicMock (or whatever you tell it to) for the duration of the patch. The magic, and the gotchas, all come from how it performs this sleight of hand and how you control its reach.

64.1 unittest.mock: Mock, MagicMock, and patch

Right, let’s talk about unittest.mock. This is the module you’ll use to surgically remove the messy, unpredictable, and slow parts of your system so you can test your code in glorious, sterile isolation. It’s like putting your code in a cleanroom, except instead of a bunny suit, you wear a smug grin. The core idea is simple: you replace real objects (which might talk to databases, APIs, or the file system) with fake ones—mock objects—that you completely control. You can then ask these mock objects, “Hey, was this method called? With what arguments? How many times?” This lets you test the behavior of your code (did it make the right call?) rather than just its state (is the output correct?).

— joke —

...