Effective Property-Based Testing

Russell Mull

| Principal Software Engineer, Auxon

We have a diverse testing and verification toolkit at Auxon, but one of our most important tools is property-based testing, or proptest for short.

# What's Proptests, Precious?

Put briefly, property-based testing is a way of testing your code that separates a description of some property of your system from a description of the domain over which that property must hold. For example, if you're testing the 'addition' operation, you might what to check the property that it's commutative (the output is the same, regardless of the order). The relevant domain here is pairs of numbers.

For the mathematically inclined, you can think of the property as being a proposition with some number of free variables, and the domains as binding each free variable with a universal quantifier. The normal definition of addition commutativity fits this framework:

If you're new to this, I recommend trying to write at least one proptest, then reading the advice below.

This is not going to be a basic 'How to proptest' tutorial. There are many, many treatments of that subject; you should probably start with the documentation for your language's proptest library. A few of my favorites are:

proptest (Rust) (opens new window) (We use this extensively at Auxon)
test.check (Clojure) (opens new window)
hypothesis (Python) (opens new window)

Property-based testing libraries are extensively available. There's probably one you can use in your language of choice.

# Why Proptest?

Testing and specification can be thought of as a spectrum of rigor. On the least-rigorous end, we have manual testing and regular written specifications. Moving up, we find automated unit / integration test and executable specification languages (like Cucumber, another popular tool at Auxon). Then there's a massive gulf, and everything gets starts getting academic and difficult; symbolic execution, model checking, and formal proofs show up at the far end.

property-based

Proptests are somewhere in the middle, just a little past the tests you're already doing. They're a very practical way of verifying your implementation against a rigorous, executable specification. But they're not much harder to write than the unit tests you're already doing.

# Terminology Check

Different proptest libraries use different words to talk about the same thing. In this article, I'll use 'Generator' to talk about the thing that defines the input domain for your proptest. (Some people refer to proptests as generative testing). This is also known as a 'Strategy' in Rust proptest / Python hypothesis parlance.

Code examples will be written in Rust, but the concepts and recommendations apply to any proptest framework.

# Write 'em, Compose 'em, Put 'em in a Suite

# Item 1: Develop a Generator Library

Your program probably has a bunch of classes or data structures that it cares about deeply, that are central to its operation and passed all around the system. You should identify theses data structures and write generators for them. You can do this incrementally, as you begin to work with proptests.

# Item 2: Be a little suspicious of type-based generators

Many proptest frameworks provide a way to make a generator directly from a structure, type, or schema definition. These are of course very convenient, and I've used them often. But they're not as clearly good as you might think.

First, your types may not be as precise as what you need for your generators. Maybe you have a single type which can have two distinct data variants (maybe both valid and invalid requests?) which you want to generate separately. Second, event if your type definition is fully precise, you may find that practical execution considerations require you to make size or other input domain restrictions in the generators. This happens fairly frequently.

For these reasons, I recommend staying away for type-based generators for all but the most simple applications. We've found that's it's best to just start with standalone generators; it costs a little more up front, but I always end up converting to them later on anyways. Clojure's core.spec-based generators are an exception here, as they provide an ad-hoc way to override portions of the generated generator. But even then, it's a fairly laborious process.

# Item 3: Touch-test your generators

When you're writing your generators, use your language's interactive execution facilities to try them out. It's common to find unexpected variations in proptest-generated data, and this can help you shortcut the "WHY did my tests fail?" process. I've had a lot of success using this technique in Clojure.

# Item 4: Learn your Combinators

Proptest libraries are built from small generators that can be combined to form more complex structures. Some of these are language/environment dependent, but the important ones are common:

Primitive types - You'll start with generators for all the primitives in your language: integers, floats, strings, etc.
Collection types - Look for direct support for native collections in your framework; you can use these to generate a collection of anything else you can generate. Note that collection size can be bounded; you'll find this necessary when developing larger compound generators.
just / return - Proptest frameworks always have a way to 'lift' a single value, creating a generator that just spits out that value every time.
map - Transform a generated value in some way. You might use this to generate numbers that have been converted to strings.
filter - Throw out generated values if they don't pass some test. This is convenient but a little dangerous; see below.
tuple - Get data from multiple generators together. (This is kind of like 'and')
one of / choose - Get data from one of the given generators. (This is kind of like 'or')
data structures - Different frameworks approach this differently, but most provide some amount of sugar to help you generate native data structures. Lacking that, one effective strategy is to start with a tuple generator with one item for each field in your data structure, then use map generator to put them all together.
recursive structures - If you want to generate a tree or some other kind of recursive data structure, look for direct support for that in your framework.

# Item 5: Prefer Constructive Generation to Filtering

The filter combinator is really easy, and one of the first things people reach for when writing proptests for the first time. The trouble with this is that it's inefficient, and it can break down entirely when used broadly.

Suppose you wanted to make a generator for even numbers. It's easy to do with filter:

let gen_evens = any::<u32>()
    .prop_filter(|x| x % 2 == 0);

This works, but it's doing a lost of wasted work: half of all the numbers generated by the first step will be thrown out. Suppose you did the same thing to generate multiples of 10: you'll be throwing away 90% of your work. Powers of two? That's way out of hand. Most proptest frameworks will abort execution if they aren't able to generate a valid sample after some number of tries.

This is not only a problem with single value generators: you'll find that as you start to combine generators to produce more complex structures, the effects of multiple filter-based generators will accumulate, and you'll be aborting execution all the time.

The solution to this is to instead build your generators constructively. Start with a fully specified domain, then transform that to get what you want. This can be tricky, and sometimes requires oblique thinking to get it right. In the case of our even numbers, it's not too bad:

let gen_evens = (0..2^31)
    .prop_map(|x| x * 2);

This is not a hard-and-fast rule; there are plenty of times where filter is the only choice. But you should beware of your filter's selectivity, and be careful of using it too much.

# Item 6: Use flatmap/bind for Data Dependencies

Many data structures have some kind of internal consistency requirement. For example, you might have a record that contains a vector of strings, and an index into that list representing a cursor. We'd like to only generate valid cursors. (Sometimes you want to generate invalid cursors as well, of course)

struct StringsAndCursor {
    strings: Vec<String>,
    cursor: usize
}

The naive way to do this might be to start with a tuple of the vector generator and a cursor generator, and then use filter to remove all tuples where the cursor is out of bounds:

// BAD example, don't do this
let gen_strings_and_cursor = (any::<Vec<String>>(), any::<usize>())
    .prop_filter(|(strings, cursor)| cursor < strings.len())
    .prop_map(|(strings, cursor)| StringsAndCursor { strings, cursor });

This is a correct and succinct way to specify the constraint, but you'll find that your proptest framework probably gives up after generating too many candidates that don't match. It's radically inefficient at best.

The constructive way to do this is to start with a generator for the list of strings, and then use flatmap (or bind, the names vary) to generate an in-bounds index:

let gen_strings_and_cursor = any::<Vec<String>>()
    .prop_flat_map(|strings| (Just(strings), (0..strings.len())))
    .prop_map(|(strings, cursor)| StringsAndCursor { strings, cursor });

# Item 7: Bound your collections

When you're making generators for collection-typed values (Vectors, Lists, Maps, etc), it's tempting to say "do this for ALL the possible collections!" In terms of specification, that's correct. But in terms of execution, it can get out of hand. Does your system really need to be able to handle 200 million distinct zip codes?

This can get especially bad if you have compound data structures with collections nested under other collections. Even if your implementation can gracefully handle the load, you may find that your proptest framework cannot.

You should carefully consider what you actually care about verifying, and target that. I often choose a 'reasonable' (which depends on the context) limit to the size of straight collections of simple values. When dealing with nested collections, the "zero, one, many" rule of thumb applies; many == 2 or 3 typically works well.

# Look! It's a Propiphaunt!

# Item 8: Start small

When you're bringing proptest into a code base for the first time, it's tempting to Specify All The Things. This is unfortunately a good way to fail. If you jump into the deep end, you'll find that your proptests are mysteriously failing or hanging in ways you can't explain. It's important to go at this incrementally as you develop a feel for what kinds of generators your system needs, and what kinds of properties you can feasibly verify.

Start by proptesting your pure functions, subroutines that have no side effects and execute quickly. If you're coding in an OO style, start by synthesizing some small model objects and exercising their methods.

# Item 9: Test data round-trips

One of the easiest places to start with proptests is to identify places in your system where one piece of information can move across some boundary and back. For example, you have a have a data structure for an network protocol that can be used in both requests and responses. Start by generating the data, then transform it to the external form, transform it back, and check that the original data is equal to the result of the round-trip.

This sounds so simple as to be useless; it's not. Lossless encoding/decoding is a very important property of many system components, and if you haven't tested it, you probably don't have it.

Other places to look for round-trip testing opportunities include database storage and configuration files.

# Item 10: Test the "It doesn't crash" property

If you can't think of what properties you should check, "it doesn't crash" is totally legit. In a language with exceptions, you'll be shocked at what starts bubbling up when your proptests start synthesizing interesting edge cases.

# Item 11: Test system boundaries

When you think about the structure of your system, you can probably identify some important boundaries in it. Maybe they are module boundaries, microservice APIs, or an important plugin interface. Regardless of the technology, you have some API in your system with an out-sized impact on everything else, whose correctness has a large impact on everything.

Find these boundaries and proptest them; pay particular attention to the properties of the implementation which cannot be specified in the type signature or whatever IDL you're using.

# Item 12: Test the whole system

If you have acceptance tests, see if you can convert some of them to proptests. Using Data Tables in Cucumber or similar example-based testing is a good sign that you might have some proptests hidden in there that want to get out.

# Item 13: Consider disabling shrinking

Automatic shrinking is one of the great victories of property-based testing. When it works, it's magical. It's at its best when you're testing something that can execute really fast, like a relatively small purely-functional part of your system.

As you start to bring your proptests up a level or two and test larger pieces of your system, you'll find that shrinking takes a REALLY long time. This is a function of cycle time; how long it takes to run a single cycle of your test. If it takes 1 millisecond, everything's fine, but if it takes 1 second, you'll have a bad time. In cases like this, I've found it best to simply disable shrinking in the proptest framework; manual inspection of generated data which triggered the failures can just be faster sometimes.

# Conclusion

Software correctness is a tricky thing. In many circles, the prevailing value system emphasizes things like agility, convenience, and expressiveness. Others care deeply about process and specification. To me, proptests fall right in the middle of that spectrum. They're very practical and pretty easy to create once you've got the hang of it. But they also help you think of your system in a more 'formal' way, broadly considering its invariants over the domains you specify. This is the same way you'll need to think of your system as you move to more formal modes of analysis.

I hope you find some places to proptest your code. The world has too much broken software, and this is a way you can make yours better.