Property Testing in C
A poor-man's fuzzer
Last time I wrote about snapshot testing, which is great for capturing behavior of a current system - especially in a way that can be quickly checked. I use it as a complement to traditional example-based unit testing where we check a system gives expected (known) output to known input.
However, snapshot testing and example-based testing don’t find bugs. They’re not meant to. They instead act as anchor points for a system, saying that the system behaves this way at this point - but they say nothing about any other point.
For instance, let’s say we wrote a custom memory allocator, and we’re writing tests for it. We might write an example-based test like the following:
TEST(my_allocator) {
int initUsage = my_alloc_usage();
void *ptr = my_alloc(24);
int curUsage = my_alloc_usage() - initUsage;
int expectedUsage = 24;
ASSERT_EQ(expectedUsage, curUsage);
ASSERT_NE(NULL, ptr);
my_free(&ptr);
int finalUsage = my_usage();
ASSERT_EQ(initUsage, finalUsage);
PASS();
}It’s not hard to add a single point. The above code shows a test that allocates and frees some amount of memory, with different assertions making sure things work. We can add more tests for different sizes, or maybe multiple allocations. Each test will act as an anchor checking how our allocator performs.
But, most bugs in production don’t happen at the anchored input - and many don’t happen close around the input. Instead, they tend to happen in the gaps between each anchor - gaps which tend to be very large and obscure to find. How the system really behaves is unknown until the code is ran with that input.
To illustrate how hard it is to get good coverage, let’s look at our use case. Users making sure that the allocator behaves a certain way way at that input point - but only at that input point. In our case, the input is the combination of:
The order of allocations
The order of frees for each allocation
The size of each allocation
The order allocations and frees are interleaved
It turns out, changing any one of those parameters can cause our allocator to run through any number of different code paths - and trigger any number of hidden bugs.
We could add a lot of test cases and try to cover as many possibilities by hand - but we would be typing test code until the sun burned out trying to catch every possible combination. And, most of the tests would look suspiciously similar.
Well, what if we wrote a program that tried to break our app? We could do that - it’s called a fuzzer. And there are several programs already out there for different languages. However, writing a fuzzer is non-trivial, and getting an existing fuzzer to work would require not only installing it on every dev machine (and build server), but also time to set it up, training on how to use it, and dealing with updates. For many projects, it’s worth the effort, but for many projects it’s also not worth the effort.
Is there an in-between? Something simple enough to write as part of tests but powerful enough to test many different cases? Well, yes!
They’re property tests!
Property tests are just a fancy name saying “with random data shaped like this, throw it into my code and expect it to sort of behave like that.” In the case of our allocator, we just take a bunch of random allocation and free requests and throw them at our code. Similar to the following:
#ifndef MAXROUNDS
#define MAXROUNDS 1000
#endif
#ifndef MAXALLOCS
#define MAXALLOCS 100
#endif
TEST(my_allocator_rand) {
size_t initUsage = my_alloc_usage(&alloc);
void *allocations[MAXALLOCS] = {NULL};
srand(time(0));
for (size_t round = 0; round < MAXROUNDS; ++round) {
size_t slot = (rand() / 16) % MAXALLOCS;
if (allocations[slot]) {
my_free(allocations[slot]);
allocations[slot] = NULL;
} else {
size_t allocAmount = (rand() / 16) % 4000 + 64;
allocations[slot] = my_alloc(allocAmount);
ASSERT_NE(NULL, allocations[slot]);
}
}
for (size_t slot = 0; slot < MAXALLOCS; slot++) {
if (allocations[slot]) {
peaks_free(&alloc, allocations[slot]);
}
}
size_t finalUsage = my_alloc_usage(&alloc);
ASSERT_EQ(initUsage, finalUsage);
PASS();
}The above code randomizes1 the interleaving of allocations with frees and randomizes the allocation size. It’s a very simple check, and yet it covers a wide range of possibilities.
However, property tests aren’t a silver bullet - and there are some notable drawbacks. But before we get to those drawbacks, let’s show another example that will highlight the drawbacks even more. Let’s move from memory allocators to math functions - like sin. Here’s a test for sin.
#ifndef MAXROUNDS
#define MAXROUNDS 1000
#endif
TEST(my_sin) {
void *allocations[MAXALLOCS] = {NULL};
for (size_t round = 0; round < MAXROUNDS; ++round) {
double in = (double)(rand() / 16) / 50000.0;
double out = my_sin(in);
ASSERT_LE(out, 1);
ASSERT_GE(out, -1);
}
PASS();
} If you’re wondering how this test ensures each output is the correct approximation for the corresponding input, the answer is very simple: it doesn’t. Which leads me into a big downside of property tests: they don’t really check for correctness.2
We’re running through a lot of questions - but we can’t verify if they’re right. So, what’s the point?
The point is we’re looking for unknown failures not known successes. This is most commonly manifested with exceptions (e.g. error codes), failed assertions (via assert3), segmentation faults, and behavior detected by sanitizers (like clang’s thread sanitizer, address sanitizer, memory sanitizer, and UB sanitizer).
For instance, our free code may look something like the following:
void my_free(void *ptr) {
if (ptr == NULL) return;
pthread_mutex_lock(&global_alloc_lock);
struct MyBlockHeader *block = my_align_pointer_down(
(struct MyaBlockHeader *) ptr - 1, MY_MEM_ALIGNMENT);
// My ensure macro runs an assertion in debug and release mode
// On a failure, it outputs a formatted message
// Along with file and line information to quickly id the failed code
assert(block != NULL);
// Ensure that the header is valid (e.g. look for "magic bytes")
assert(my_block_is_valid(block));
// Detect "double frees"
assert(block->used == true);
block->used = false; // mark it as freed
merge_with_neighbors(block);
// make sure we didn't corrupt the current (or neighboring) blocks
assert(my_block_is_valid(block));
assert(block->next == NULL || my_block_is_valid(block->next));
assert(block->prev == NULL || my_block_is_valid(block->prev));
}These asserts add a lot more power to property tests. If we fail an assert, we’ll know. And since we’re doing a lot more tests and a lot more input variation, we make it a lot more likely that we’ll trip a bug which will fail an assert.
But wait - we’re using random numbers in our property tests! Which means that even if we find a bug we may not be able to reproduce it.
That is unless we do two things - print the seed at the start of every run, and provide a way to seed a run if we desire. Once we do that, we’re able to reproduce a failure4 and start debugging it.
There are other aspects to property testing that people tend to focus a whole lot on (rules for input generation, how to define “properties”, shrinking, etc.). However, I’ve found that they’re either very easy to do, not very important, or mostly used when trying to create a regression test (example or snapshot test) around a failed property test. In other words, they’re “nice to have” but aren’t the real meat of property testing, and can often distract or take away from the use of property testing.
If this post inspired you to start using property testing, I do have some library recommendations for you.
theft is a C99 property testing library that’s been around for a while. It is pretty heavy-handed when it comes to
rapdicheck is a C++ property testing library which I’ve used to test C code
Test_Jam is a C++/C library I made for property testing a few years back. The C interface is very primitive and poorly tested as I focused primarily on the C++ interface.
PeaksC is a library I’m currently working on which has some property testing utilities built in. I’m currently working on this project. It is highly opinionated though, so if you need a drop-in solution for a large code base, then one of the other solutions may be better.
Yes, I know rand() is a linear congruent generator and isn’t very “random” - but it works well enough for basic use cases and can be quickly switched out.
A property test I can think of which could verify sin is to generate a right triangle and solve for all the angles without using sin, and then verify that sin lines up with the solved solution. There are often cases like this, but they take a lot more working out to get right, and by the time you do when a test case fails it’s unclear if it’s a bug in the code or a bug in the test. Which makes debugging way more difficult.
The two downsides to assert is it only runs in debug builds - meaning it’s not going to catch bugs in a release build - and it has very limited debug information when the program crashes. In my code, I have a macro called “ensure” which runs in debug and release builds, an which allows printing a formatted string that also includes the file and line number.
There are some other caveats, like if a test is non-deterministic by nature (e.g. relies on calls to the system clock or doing network calls) then a seed won’t be sufficient to reproduce without additional work. But at that point, we’re leaving the realm of property tests and entering either mocking, deterministic simulation testing, or record-and-replay debugging.
Of which mocking is the worst option because we now exclude any possibility of finding unknown failures on a whole segment of our code - which also tends to be the most flaky and complex dependency of our code due to it’s non-deterministic nature. Couple this with the fact that mocked code is almost always someone else’s code (e.g. OS or network service) which also means we’re not testing our integration layer with somebody else’s code. Not only that, but we aren’t running mocked code - so we aren’t even testing our known successes in addition to not finding unknown failures!


