escapewindow: escape window (Default)
aki ([personal profile] escapewindow) wrote2016-08-30 12:29 pm
Entry tags:

100% test coverage [in python]

Tl;dr: I reached 100% test coverage (as measured by coverage) on several of my projects, and I'm planning on continuing that trend for the important projects I maintain. We were chatting about this, and I decided to write a blog post about how to get to 100% test coverage, and why.

Why 100% test coverage?

Find unreachable code

Back in 2014, Apple issued a critical security update for iOS, to patch a bug known as #gotofail.

static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
                                 uint8_t *signature, UInt16 signatureLen)
{
    OSStatus err;
    ...

    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
goto fail; goto fail;
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) goto fail; ... fail: SSLFreeBuffer(&signedHashes); SSLFreeBuffer(&hashCtx); return err; }

The second highlighted goto fail caused SSLVerifySignedServerKeyExchange calls to jump to the fail block with a successful err. This bypassed the actual signature verification and returned a blanket success. With this bug in place, malicious servers would have been able to serve SSL certs with mismatched or missing private keys, and still look legit.

One might argue for strict indentation, using curly braces for all if blocks, better coding practices, or just Better Reviews. But 100% test coverage would have found it.

"For the bug at hand, even the simplest form of coverage analysis, namely line coverage, would have helped to spot the problem: Since the problematic code results from unreachable code, there is no way to achieve 100% line coverage.

"Therefore, any serious attempt to achieve full statement coverage should have revealed this bug."

Even though python has meaningful indentation, that doesn't mean it's foolproof against unreachable code. Simply marking each line as visited is enough to protect against this class of bug.

And even when there isn't a huge security issue at stake, unreachable code can clutter the code base. At best, it's dead weight, extra lines of code to work around, to refactor, to have to read when debugging. At worst, it's confusing, misleading, and hides serious bugs. Don't do unreachable code.

Uncovering bugs while writing tests

My first attempt at 100% test coverage was on a personal project, on a whim. I knew all my code was good, but full coverage was something I had never seen. It was a lark. A throwaway bit of effort to see if I could do it, and then something I could mention offhand. "Oh yeah, and by the way, it has 100% test coverage."

Imagine my surprise when I uncovered some non-trivial bugs.

This has been the norm. Every time I put forward a sustained effort to add test coverage to a project, I uncover some things that need fixing. And I fix them. Even if its as small as a confusing variable name or a complex bit of logic that could use a comment or two. And often it's more than that.

Certainly, if I'm able to find and fix bugs while writing tests, I should be able to find and fix those bugs when they're uncovered in the wild, no? The answer is yes. But. It's a difference of headspace and focus.

When I'm focused on isolating a discrete chunk of code for testing, it's easier to find wrong or confusing behavior than when when I'm deploying some large, complex production process that may involve many other pieces of software. There are too many variables, too little headspace for the problem, and often too little time. And Murphy says those are the times these problems are likely to crop up. I'd rather have higher confidence at those times. Full test coverage helps reduce those variables when the chips are down.

A helpful guide for future changes

Software is the proverbial shark: it's constantly moving, or it dies. Small patches are pretty easy to eyeball and tell if they're good. Well written and maintained code helps here too, as does documentation, mentoring, and code reviews. But only tests can verify that the code is still good, especially when dealing with larger patches and refactors.

Tests can help new contributors catch problems before they reach code review or production. And sometimes after a long hiatus and mental context switch, I feel like a new contributor to my own projects. The new contributor you are helping may be you in six months.

Sometimes your dependencies release new versions, and it's nice to have a full set of tests to see if something will break before rolling anything out to production. And when making large changes or refactoring, tests can be a way of keeping your sanity.

Momentum is contagious

There's the social aspect of codebase maintenance. If you want to ask a new contributor to write tests for their code, that's an easier ask when you're at 100% coverage than when you're at, say, 47%.

This can also carry over to your peers' and coworkers' projects. Enough momentum, and you can build an ecosystem of well-tested projects, where the majority of all bugs are caught before any code ever lands on the main repo.

Technical wealth

Have you read Forget Technical Debt - Here's How to Build Technical Wealth? It's well worth the read.

Automated testing and self-validation loom large in that post. She does caution against 100% test coverage as an end in itself in the early days of a startup, for example, but that's when time to market and profitability are opportunity costs. When developing technical wealth at an existing enterprise, the long term maintenance benefits speak for themselves.

How did you reach 100% test coverage [in python]?

Clean architecture

If you haven't watched Clean Architecture in Python, add it to your list. It's worth the time. It's the Python take on Uncle Bob Martin's original talk. Besides making your code more maintainable, it makes your code more testable. When I/O is cleanly separated from the logic, the logic is easily testable, and the I/O portions can be tested in mocked code and integration tests.

parse_args()

Some people swear by click. I'm not quite sold on it, yet, but it's easy to want something other than optparse or argparse when faced with a block of option logic like in this main() function. How can you possibly sanely test all of that, especially when right after the option parsing we go into the main logic of the script?

I pulled all of the optparse logic into its own function, and I called it with sys.argv[:1]. That let me start testing optparse tests, separate from my signing tests. Signtool is one of those projects where I haven't yet reached 100% coverage, but it's high on my list.

Toss if __name__ == '__main__':

if __name__ == '__main__': is one of the first things we learn in python. It allows us to mix script and library functionality into the same file: when you import the file, __name__ is the name of the module, and the block doesn't execute, and when you run it as a script, __name__ is set to __main__, and the block executes. (More detailed descriptions here.)

There are ways to skip these blocks in code coverage. That might be the easiest solution if all you have under if __name__ == '__main__': is a call to main(). But I've seen scripts where there are pages of code inside this block, all code-coverage-unfriendly.

I forget where I found my solution. This answer suggests __name__ == '__main__' and main(). I rewrote this as

def main(name=None):
    if name in (None, '__main__'):
        ...


main(name=__name__)

Either way, you don't have to worry about covering the special if block. You can mock the heck out of main() and get to 100% coverage that way. (See the mock and integration section.)

Rewrite the logic

mimetypes.guess_type('foo.log') returns 'text/plain' on OSX, and None on linux. I fixed it like this:

content_type, _ = mimetypes.guess_type(path)
if content_type is None and path.endswith('.log'):
    content_type = 'text/plain'

And that works. You can get full coverage on linux: a "foo.log" will enter the if block, and a "foo.unknown_extension" will skip it. But on OSX you'll never satisfy the conditional; the content_type will never be None for a foo.log.

More ugly mocking, right? You could. But how about:

content_type, _ = mimetypes.guess_type(path)
if path.endswith('.log'):
    content_type = content_type or 'text/plain'

Coverage will mark that as covered on both linux and OSX.

@pytest.mark.parametrize

I used to use nosetests for everything, just because that's what I first encountered in the wild. When I first hacked on PGPy I had to get accustomed to pytest, but that was pretty easy. One thing that drew me to it was @pytest.mark.parametrize (which is an alternate spelling.)

With @pytest.mark.parametrize, you can loop through multiple inputs with the same logic, without having to write the loops yourself. Like this. Or, for the PGPy unicode example, this.

This is doable in nosetests, but pytest encourages it by making it easy.

fixtures

In nosetests, I would often write small functions to give me objects or data structures to test against. That, or I'd define constants, and I'd be careful to avoid changing any mutable constants, e.g. dicts, during testing.

pytest fixtures allow you to do that, but more simply. With a @pytest.fixture(scope="function"), the fixture will get reset every function, so even if a test changes the fixture, the next test will get a clean copy.

There are also built-in fixtures, like tmpdir, mocker, and event_loop, which allow you to more easily and succintly perform setup and cleanup around your tests. And there are lots of additional modules which add fixtures or other functionality to pytest.

Here are slides from a talk on advanced fixture use.

mock and integration

It's certainly possible to test the I/O and loop-laden main sections of your project. For unit tests, it's a matter of mocking the heck out of it. Pytest's mocker fixture helps here, although it's certainly possible to nest a bunch of with mock.patch blocks to avoid mocking that code past the end of your test.

Even if you have to include some I/O in a function, that doesn't mean you have to use mock to test it. Instead of

def foo():
    requests.get(...)

you could do

def foo(request_function=requests.get):
    request_function(...)

Then when you unit test, you can override request_function with something that raises the exception or returns the value that you want to test.

Finally, even after you get to 100% coverage on unit tests, it's good to add some integration testing to make sure the project works in the real world. For scriptworker, these tests are marked with @pytest.mark.skipif(os.environ.get("NO_TESTS_OVER_WIRE"), reason=SKIP_REASON), so we can turn them off in Travis.

Use pytest!

Besides parameterization and the mocker fixture, pytest is easily superior to nosetests.

There's parallelization with pytest-xdist. The fixtures make per-test setup and cleanup a breeze. Pytest is more succinct: assert x == y instead of self.assertEquals(x, y) ; because we don't have the self to deal with, the tests can be functions instead of methods. The following

with pytest.raises(OSError):
    function(*args, **kwargs)

is a lot more legible than self.assertRaises(OSError, function, *args, **kwargs) . You've got @pytest.mark.asyncio to await a coroutine; the event_loop fixture is there for more asyncio testing goodness, with auto-open/close for each test.

Miscellaneous hints

Use tox, coveralls, and travis or taskcluster-github for per-checkin test and coverage results.

Use coverage>=4.2 for async code. Pre-4.2 can't handle the async.

Use coverage's branch coverage by setting branch = True in your .coveragerc.

I haven't tried hypothesis or other behavior- or property- based testing yet. But hypothesis does use pytest. There's an issue open to integrate them more deeply.

It looks like I need to learn more about mutation testing.

What other tricks am I missing?