afl-fuzz on different file systems

One day I was fuzzing around with american fuzzy lop and accidentally pointed the output directory for fuzzer findings to point onto a file system on a physical disk instead of the usual shared memory file system. I noticed my mistake and changed the output directory to the usual place. Consequently there was a clear difference on how many fuzzing iterations/second afl-fuzz program could do. This then lead into this comparison of different file systems and fuzzing modes on them with the workload that afl-fuzz has.

Hardware and software background

Fuzzing is a computationally intensive way to find bugs in programs, usually fully exercising the machine CPU. This and writing data to the disk can have adverse effects on the hardware even if we are working with software. Some risks are listed in the common-sense risks section at american fuzzy lop’s README.txt file that is a good read about the subject. In this article I’m mainly interested in what happens when you happen to point the output of afl-fuzz to a device with a specific file system.

afl-fuzz as part of american fuzzy lop

American fuzzy lop is a successful generic purpose fuzzer that finds bugs for you while you sleep. afl-fuzz is the executable program that does the hard work of generating new data, repeatedly running the target program (fuzz target), and analyzing the results that come up in these fuzz target executions. It has handcrafted heuristics and fuzzing strategies that in practice provide quite successful results without the need for tuning.

Figure 1: Interworkings of afl-fuzz, the fork server, and the fuzz target.

Figure 1 roughly visualizes the different parts that are part of a fuzzing session when using afl-fuzz. It spawns the program that we try to fuzz. This program then creates a fork server that is responsible for communicating with the main afl-fuzz executable over pipes and spawning new fuzz target instances with fork() call. The fork server actually a small shim that is part of the fuzz target. Also the new program instance that fork() call spawns still technically holds a reference to the fork server, but those parts are just active at different times, visualized as gray in the figure. If the fuzz target is in persistent mode, it doesn’t get restarted for every new input.

Data is passed to the fuzz target either over standard input or over a newly created file that the fuzz target reads. Then the fuzz target executes itself by writing an execution trace to a shared memory and finishes running. afl-fuzz then reads the trace left by the fuzz target from the shared memory and creates a new input by mutating old ones. What to mutate is controlled by the new and historical information that the instrumentation data from the fuzz target provides.

Solid state drives

Solid state drives are nowadays the most common physical storage medium in generic and special purpose computers. They are fast and relatively decently priced for the storage needs of a regular software developer. The decently priced solid state drives come with a price of somewhat limited write endurance due to the properties of the physical world and manufacturing compromises.

Currently (early 2018) the write endurance of a solid state drive promises to be around half in terabytes than what the drive capacity is in gigabytes on consumer level drives using (TLC NAND). This means that 200-250 gigabyte SSD has a write endurance around 100 terabytes. In practice, the write endurance will likely be more at least on a test from 2013-2015.

Such write endurance does not really pose any limitations in the usual consumer level and professional workloads. But when it comes to fuzzing, it’s not the most usual workload. The whole idea of a fuzzer is to generate new data all the time, make the program execute it, and then either discard or save it depending of the program behavior is interesting or not. So we can expect that a fuzzer will generate quite a lot of data while it’s running.

Quick back-of-the-envelope calculation when assuming 10 parallel fuzzers running 1000 executions/second with 1000 byte input/execution on average would result in over 20 terabytes of data generated every month. If all this is directly written to a drive, then we can expect to have around 5 months until we reach the promised write endurance limits of a typical solid state drive at the time of this writing. There is caching that works to prolong this, but then there are all the invisible file system data structures, minimum file system block sizes, and write amplification then fight against these savings.

Two data passing modes of afl-fuzz

afl-fuzz program has two ways to pass the new input to the fuzz target: passing the data over the standard input and creating a new file with the new data and having the fuzz target to read the data from the just created file file.

The most compact afl-fuzz command only includes the information for initial inputs, output directory and the command to execute the fuzz target without any special arguments:

$ afl-fuzz -i in/ -o out/ -- fuzz-target

The fuzz target gets the input data from its standard input file descriptor so it does not need to explicitly open any files. Technically the standard input of the child process is backed by a file that is always modified by the afl-fuzz process when new input it generated. This file is also .cur_input file at the fuzzer instance directory. The exact details of how this works are explained later.

The more explicit input file path information including commands look like this:

# "@@" indicates the name of the file that the program gets with new
# fuzz data. "-f" can be used to hard code the input file location
# that then can be passed to the fuzz target or not.
$ afl-fuzz -i in/ -o out/ -- fuzz-target @@
$ afl-fuzz -i in/ -o out/ -f target.file -- fuzz-target @@
$ afl-fuzz -i in/ -o out/ -f target.file -- fuzz-target

In the first command there is no explicit path to the input file specified, so the target program will get the path as the first argument, indicated by @@. The actual file is located at the fuzzer instance output directory (here it’s out/) as .cur_input file. Second two forms with -f target.file switch make it possible to define the location of the input file so that it can reside outside the output directory.

Details of input data writing

Now that we know that afl-fuzz provides two different ways for the data to end in the program, then let’s look at the extracted details of write_to_testcase() function to see how this data is passed to the child process.

When afl-fuzz writes data into the standard input stream of the fuzz target, the main parts of the code look like this in both afl-fuzz and fuzz target side:

// afl-fuzz part:
{
    // Make sure that reads done by fuzz target do not affect
    // new testcase writes:
    lseek(testcase_fd, 0, SEEK_SET);
    // Write new testcase data to the file:
    write(testcase_fd, testcase_data, size);
    // Make sure that old data in testcase file does not leak
    // into the new fuzzing iteration:
    ftruncate(testcase_fd, size);
    // Make read() act as it would read a fresh standard input
    // instance:
    lseek(testcase_fd, 0, SEEK_SET);
}
// Fuzz target part:
{
    // stdin file descriptor inside fuzz target is actually
    // identical to testcase_fd thanks to dup2(). So we just
    // read whatever happens to be there:
    ssize_t read_size = read(testcase_fd, buffer, sizeof(buffer));
    // Do some actual fuzzing:
    fuzz_one(buffer, read_size);
}

What happens in here that first the position in the file is changed to the beginning. A new data is written over the old one. The file length is changed to correspond to the new data length. The position is set to the beginning of the file again. Then the fuzz target reads from this file descriptor and runs the just written data through the actual program logic.

When passing data to the program through a named file following types of operations happen:

// afl-fuzz part:
{
    // Remove the old output file.
    unlink(out_file);
    // Create a new file with the same name.
    int out_fd = open(out_file, O_WRONLY | O_CREAT | O_EXCL, 0600);
    if (out_fd < 0) { abort(); }
    // Write the test data into the created file.
    write(out_fd, data, size);
    // Close the created file so that the data is available to other
    // processes.
    close(out_fd);
}
// Fuzz target part:
{
    // Open the file created by afl-fuzz.
    int in_fd = open(out_file, O_RDONLY);
    if (in_fd < 0) { abort(); }
    // Read enough data from the opened file descriptor for fuzzing.
    ssize_t read_size = read(in_fd, buffer, sizeof(buffer));
    // Close the opened file descriptor so that we don't leak
    // resources.
    close(in_fd);
    // Do some actual fuzzing:
    fuzz_one(buffer, read_size);
}

Basically old input file is removed and a new file is created, data is written to it and it’s closed. Then the fuzz target opens the just created file, reads the data out of it and closes the opened file descriptor closes it, and runs the data through the actual program logic.

You can already see that there are more file system related functions used when the fuzz data file is recreated. Then these functions are also more heavy than the ones used in the standard input version. When you call unlink() and open(), Linux kernel needs to do pathname lookup to figure out what exact file objects are accessed. When you only have the file descriptor to manipulate in the standard input case, you avoid these pathname lookups and hopefully manipulate purely numeric data. Also when you open a new file, it has to actually create the corresponding inodes and their data structures to the file system. This has a certain amount of overhead.

So looking at the called functions, it would feel like there is going to be a fuzzing overhead difference between these two input data creation approaches.

Benchmarking

Theoretical speculation is always nice, but the real hard data comes from the benchmarking. This section looks at the fuzzing overhead from both afl-fuzz perspective with low and high level filesystem access functions and from raw file system access perspective. Also the section about data writes tries to find an answer to the question that how damaging fuzzing can actually be to a solid state drive with a relatively limited write endurance.

I did these benchmarks with several major general purpose file systems found from Debian provided Linux kernels 4.14.13-1 and 4.15.11-1. These can show up as differences in numbers between tables, but numbers inside the same table are done with the same system. Also there is small variance between fuzzing sessions that I have tried to eliminate running the fuzzers for long enough that system maintenance and other short running tasks don’t have too big of an impact. But the relative mileage may vary between kernel versions, operating system libraries, and between the actual physical machines.

Executions/second

General purpose instrumentation guided fuzzers like american fuzzy lop and libFuzzer get their power from the fact that they can repeatedly execute the fuzz target in quick succession. Speed is one factor, but also the program stability from one execution to another is a second one. Having stable program ensures that the issues that fuzzing finds can be easily replicated. This basically leads into a compromise of selecting the appropriate fuzzer and fuzz target execution strategy.

It can be that the program has a global state that needs program restart or other tricks between runs. For these types of situations the default forkserver mode in afl-fuzz is appropriate. On the other hand if everything can be functionally wrapped inside a function that does not leak its state outside, we can use the much faster persistent mode in afl-fuzz. From this it should be actually quite easy to port the fuzz target to libFuzzer.

In this case libFuzzer shows 735 k executions/second with the sample target when the data is not passed between process boundaries. It is also possible to simulate in-memory file streams with fmemopen() function where libFuzzer achieved with this example program 550 k executions/second. This is 15-20 times lower fuzzing overhead than with afl-fuzz. But in this article we focus on american fuzzy lop’s persistent mode and leave the fuzzing engine selection for some other time.

A typical american fuzzy lop’s persistent mode fuzz target would generally have some generic configuration in the beginning and then the actual data dependent program execution would reside inside a while (__AFL_LOOP(40000)) { ... } loop. The number 40000 is just the number of iterations that the program does before it restarts again and can be lower or higher depending on how confident you are that the program does not badly misbehave.

Using the standard input with the persistent mode makes the data reading code look like following:

generic_configuration(argc, argv);
int input_fd = fileno(stdin);
while (__AFL_LOOP(40000)) {
    ssize_t read_size = read(input_fd, buffer, sizeof(buffer));
    if (read_size < 0) { abort(); }
    // Do the actual fuzzing.
    fuzz_one(buffer, read_size, &values);
}

Using named files with the persistent mode on the other hand requires opening a new file every iteration:

generic_configuration(argc, argv);
while (__AFL_LOOP(40000)) {
    int input_fd = open(argv[1], O_RDONLY);
    ssize_t read_size = read(input_fd, buffer, sizeof(buffer));
    if (read_size < 0) { abort(); }
    close(input_fd);
    // Do the actual fuzzing.
    fuzz_one(buffer, read_size, &values);
}

The example fuzz target

I directly used the code from my Taking a look at python-afl article and modified it to support the file and stdin reader variants. This code is available from target-simple.cpp for the prying eyes. Also the later on introduced C and C++ standard library variants are available as target-fread.cpp and target-ifstream.cpp.

File system dependent results

The benchmarking was done with a loopback device that fully resides in memory. Each of these file systems was created and mounted with their default options and the performance test was run for 2 minutes. Only a single instance of afl-fuzz was running. You can see from the table 1 the difference between the same workload on different file systems. The shared memory virtual file system (tmpfs) was the most efficient in both of these cases.

execs/second	stdin	file
btrfs	20.5 k	14.6 k
ext2	31.6 k	19.2 k
ext3	30.0 k	18.4 k
ext4	28.0 k	18.2 k
JFS	31.7 k	17.1 k
ReiserFS	28.0 k	14.5 k
tmpfs	34.6 k	25.5 k
XFS	21.1 k	16.8 k

Table 1: Average number of fuzz target executions/second over 2 minute measuring period with different file systems.

This result also shows an interesting history between ext2, ext3, and ext4 file systems. Their performance seems to decrease the newer the file system is, likely due to larger amount of data safety operations that the newer file systems do. There could be difference when parallel access to a file system is concerned.

Execution speed with different standard libraries

I also wanted to see how the use of more portable file system calls in both C and C++ standard libraries affects the fuzzing speed. Standard libraries like glibc, libstdc++, and libc++ provide a portable file system interface across operating systems. They also provide a certain amount of buffering so that every small read does not result in a system call and suffer from context switch overhead. These higher level library functions also aim to be thread safe so that concurrent manipulation of the same file descriptor would be a little bit less surprising.

When doing fuzzing, we are usually just reading small amount of data once and then either close the input file or expect the afl-fuzz to reset the input stream for us. Any buffering information must also be discarded and the stream position must be reset so that the input data for the fuzz target is the same what afl-fuzz expects it to be. If it’s not, it will show as a lower than 100 % stability value on AFL status screen.

Following is the C code used to benchmark how much overhead the standard C library functions bring into this. Standard input is put into unbuffered mode with setvbuf() function call, so buffer manipulation should not have a meaningful amount of overhead.

// C version for glibc benchmarking with standard file manipulation functions.

FILE* input_fd = stdin;
setvbuf(input_fd, NULL, _IONBF, 0);
while (__AFL_LOOP(40000)) {
    size_t read_size = fread(buffer, 1, sizeof(buffer), input_fd);
    fuzz_one(buffer, read_size, &values);
}

// We get the fuzz data filename from argv[1].
while (__AFL_LOOP(40000)) {
    FILE* input_fd = fopen(argv[1], "rb");
    size_t read_size = fread(buffer, 1, sizeof(buffer), input_fd);
    fclose(input_fd);
    fuzz_one(buffer, read_size, &values);
}

C++ streams don’t have a working unbuffered reads that would behave reliably. The closest equivalent to this is to call istream::seekg() and after that istream::clear() function. This basically is theoretically equivalent to what rewind() function does. Other solution of trying to set istream::rdbuf() to a zero sized one had no real effect on increasing stability.

C++ code for these fuzzing benchmarks looks like following:

// C++ version for libstdc++ and libc++ benchmarking with the standard
// C++11 file manipulation functions.

auto input_fd = std::cin;
while (__AFL_LOOP(40000)) {
    // Resetting stream state is 2 function calls.
    input_fd.seekg(0, input_fd.beg);
    input_fd.clear();
    input_fd.read(buffer, sizeof(buffer));
    size_t read_size = input_fd.gcount();
    fuzz_one(buffer, read_size, &values);
}

// We get the fuzz data filename from argv[1].
while (__AFL_LOOP(40000)) {
    std::ifstream input_fd(
        argv[1], std::ifstream::in | std::ifstream::binary);
    input_fd.read(buffer, sizeof(buffer));
    size_t read_size = input_fd.gcount();
    input_fd.close();
    fuzz_one(buffer, read_size, &values);
}

C++ code is a little bit more verbose than C code, as there are extra layers of abstraction that need to be removed so that afl-fuzz style data processing is possible. Table 2 then shows what is the speed difference of these different implementations and standard libraries.

execs/second	syscalls	glibc	libstdc++	libc++
stdin	33.5 k	32.4 k (98 %)	31.2 k (93 %)	30.7 k (92 %)*
file	24.2 k	22.8 k (94 %)	22.0 k (91 %)	20.7 k (86 %)

Table 2: Fuzz target executions/second over 2 minute period with different standard libraries on tmpfs.

* The stability of libc++ stdin session was not 100 %.

We can see that the pure combination of open() and read() functions is the fastest one. Portable C standard library implementation comes quite close with the unbuffered standard input implementation, but everywhere else there is a clear performance loss.

Stability is an important measurement of how reliably fuzzing can continue. Unfortunately in this tests everything was not 100 % stable. The standard input reading and resetting for libc++ does not fully work as expected and leads into reading old data.

The conclusion here is that try to stay with the low level unbuffered file manipulation functions or as close to their equivalents. More portable higher level functions for file access may provide speed benefits through buffering for regular file access patterns, but in this case they just add extra layers that slow down the data passing from afl-fuzz to the fuzz target. Unless the fuzz target itself relies on using those higher level functions.

Data writes

In the background section I described how it generally is a bad idea to write data to a physical disk, as modern solid state drives have a limited write durability. But one could think that the write caching done by Linux would prevent a situation from happening where data is constantly written to the disk if the lifetime of a file contents is really short.

The write caching towards block devices is handled by the page cache in the kernel. But as different file systems may write every new file or file part into a new location, this cache is not necessarily as effective as it could be. Even if the writes happen to the same file with very little data.

In this section I let afl-fuzz to run for 30 minutes and looked at the kilobytes written by iostat to a block device that was a file in memory that was mounted on a loopback device. This way the device should have looked like a real block device without the risk of shortening the lifespan of any real hardware. Estimated monthly values for disk writes for various file systems can be seen from table 3 for a single afl-fuzz instance.

writes/month	stdin	file
btrfs	3.052 TiB	0.020 TiB
ext2	0.004 TiB	0.005 TiB
ext3	0.020 TiB	0.028 TiB
ext4	0.035 TiB	0.021 TiB
JFS	9.74 TiB	69.8 TiB
ReiserFS	0.021 TiB	0.024 TiB
tmpfs	-	-
XFS	201 TiB	0.004 TiB

Table 3: Estimated monthly data writes for a single fuzzer on different file systems.

The numbers for the amount of data that would be written to the device is an extrapolation of the 30 minute run and show values are in terabytes (2⁴⁰ bytes). These numbers are for the persistent mode and for an extremely light algorithm. They still give some estimate on the amount of data writes to the disk even for algorithms that are 10-100 times slower. Nowadays a typical desktop computer can easily have 8-16 threads and the order of typically processed data per fuzzer iteration is closer to 1 kilobyte than to 128 bytes as in this case.

When looking at these numbers, btrfs, JFS, and XFS file systems are pretty dangerous ones to accidentally use, as the faster stdin mode where data is fed to always open file handle actually causes a meaningful amount of data to be written to a disk per month. Especially with XFS it’s 4 kilobytes of data written/fuzzing iteration. Taking into account that many solid state disk drives have write endurance only in hundreds of terabytes, accidentally running even one fuzzer for one month or more makes a significant dent in its lifetime for these file systems.

I also wanted to see what happens when you run multiple fuzzers in parallel on the same machine. The most trivial assumption for multiple fuzzers would be that the file system writes would increase about linearly with the number of fuzzers. The results on table 4 show that this is indeed the case for most of the file systems, except for ext2. Fortunately ext2 is not the worst data writer of the measured file systems and you really need to go extra mile to use it nowadays, as newer ext3 and ext4 file systems have replaced it by default.

writes/month	stdin x4	file x4
btrfs	9.53 TiB (78 %)	0.052 TiB (65 %)
ext2	0.072 TiB (450 %)	0.120 TiB (600 %)
ext3	0.075 TiB (94 %)	0.147 TiB (130 %)
ext4	0.090 TiB (64 %)	0.088 TiB (100 %)
JFS	44.4 TiB (110 %)	339 TiB (120 %)
ReiserFS	0.069 TiB (82 %)	0.072 TiB (75 %)
tmpfs	-	-
XFS	786 TiB (98 %)	0.035 TiB (220 %)

Table 4: Monthly data writes for a 4 parallel fuzzers on different file systems and the percentage to an extrapolated number based on one afl-fuzz instance.

The worrying part here is that two quite actively and widely used file systems, btrfs and XFS, suffer greatly from the writes with the standard input data generation pattern. As this is the faster mode of fuzzing from executions/second perspective, it’s quite nasty if you put the output directory for fuzzer findings on such file system.

Theoretical limits

I also wanted to see what would be the theoretical limits for the type of file system access pattern that afl-fuzz does. This is quite a synthetic benchmark, as it does not involve any context switches between processes, signal delivery, shared memory manipulation, file descriptor duplication, and other types of data processing that afl-fuzz does.

sequences/second	stdin	file
btrfs	87.5 k	40.7 k
ext2	446.1 k	66.5 k
ext3	305.4 k	60.3 k
ext4	211.5 k	56.4 k
JFS	737.7 k	59.5 k
ReiserFS	594.1 k	40.4 k
tmpfs	682.5 k	138.0 k
XFS	101.9 k	48.5 k

Table 5: Theoretical file system performance numbers with `afl-fuzz` like access pattern.

The difference between file systems in table 5 is many times larger for standard input based data generation than for afl-fuzz (table 1). When opening and closing files, the overhead is bigger and the difference between different file systems is also smaller.

If you also compare the overhead of pure file system accesses, it is higher than the overhead of libFuzzer’s 735 k executions/second basically for all cases. On top of this there is the overhead of everything else that happens inside afl-fuzz program.

End notes

You should use tmpfs that fully resides in memory for your fuzzing data generation with afl-fuzz. Usually this is available on /dev/shm/ or /run/shm/ globally and on systemd based systems also available under /run/user/$UID/. It’s faster and less straining to the hardware than trusting that the Linux block layer does not too often write to the physical device. Especially if you happen to use btrfs or XFS file systems.

In the end, this is mostly Linux specific information and for example OS X based systems need specific tricks to use memory based file systems. Same with FreeBSD where tmpfs based file system is not mounted by default.