I have been using american fuzzy lop to fuzz various C and C++ programs and libraries. It is a wonderfully fast fuzzer that is generally easy to get started with and it usually finds new bugs in programs that have not been fuzzed previously. Support for american fuzzy lop instrumentation has also been added for other languages and I decided to try out how it works with Python. More specifically with the reference CPython implementation of it.
Fuzzing Python programs with american fuzzy lop
American fuzzy lop generally works by running a program that is compiled with american fuzzy lop instrumentation built in. It executes the program with afl-fuzz command that modifies the input data that is fed to the program, monitors how the program behaves, and registers everything that causes abnormal program behavior. This works well for natively compiled programs, but causes various issues with interpreted programs.
Python is by default an interpreted language, so to execute Python programs, you need to start a Python interpreter before executing your code. This means that if you would instrument the Python interpreter with american fuzzy lop instrumentation and run the interpreter with afl-fuzz, it would mostly fuzz the inner workings of the interpreter, not the actual Python program.
Fortunately there is python-afl module that enables american fuzzy lop instrumentation for just the Python code instead of instrumenting the Python interpreter. In native programs american fuzzy lop compiler wrapper (afl-gcc, afl-clang, afl-clang-fast) adds the necessary instrumentation and the connection to afl-fuzz. Python-afl is, however, designed in such way that it doesn’t try to wrap the whole program, but requires you to create a wrapper module that initializes fuzzing.
The most simple way to wrap a Python program with python-afl is to
initialize python-afl and then run the program (assuming that
import afl afl.init() import fuzzable_module fuzzable_module.main()
This script, saved to
fuzz-wrapper.py, can be then run with
py-afl-fuzz command that wraps afl-fuzz for Python programs:
py-afl-fuzz -m 400 -i initial-inputs/ -o fuzzing-results/ -- \ python fuzz-wrapper.py @@
Next sections will explain in more details how to make fuzzing these programs more efficient and what pitfalls there could be in Python programs from fuzzing efficiency point of view.
Afl-fuzz modes and their python-afl equivalents
Generally afl-fuzz provides 4 fuzzing modes that differ in how the program execution between different fuzzing inputs behaves:
- Dumb mode that just executes
the program by doing
execv(). This is the slowest mode that does not rely on any fancy tricks to speed up program execution and also does not provide any insights how the program behaves with different inputs.
fork server mode
where the fuzzed binary does all the initialization steps that
happen before calling the
main()function and then program is repeatedly forked from that point on. This also includes instrumentation that is compiled in to the program so there already is some insight on what is happening inside the program when a specific input is processed. There exists QEMU mode for afl-fuzz that technically enables fork server mode for uninstrumented binaries, but with some performance penalty.
- Deferred instrumentation
that works in similar fashion as the basic fork server mode. Instead
forking just before calling
main()function, this enables to move the fork point further down the line and enables heavy program initialization steps to be avoided if they can be executed independently of the input.
- Persistent mode where the fuzzable part of the program is repeatedly executed without resetting the program memory every time the program is called. This only works in practice if the program does not have a modifiable global state that can not be reset to the previous state.
Afl-fuzz generates new inputs and analyzes the program execution results roughly at the same speed regardless of the mode. So these modes are in the order of efficiency in a sense that how much overhead there is for fuzzing one input. They are also in the order of complexity on how easy they are to integrate into an existing program that has not been made to be fuzzed. Especially as the fastest modes require clang to be available as a compiler and the fuzzable program needs to be able to be compiled and linked with it.
Python-afl, fortunately, provides equivalent modes without having to use special tools. These are also very fast to try out, as you don’t need to compile Python programs from scratch.
The dumb mode would be just equivalent of running Python interpreter
directly with afl-fuzz without any instrumentation, so we will skip
over it. The more interesting part is to use the deferred
instrumentation. The code in the introductory section called
afl.init() before the fuzzable module was imported. This is the
most safe approach, as the fuzz target might do something with the
input at import time. But more realistically,
Python programs generally only call
import statements, possibly
conditionally, during the start-up and don’t handle any user provided
data yet. So in this case, we can do imports first and move the
afl.init() function just before where the actual work happens:
import afl, fuzzable_module afl.init() fuzzable_module.main()
We can gain some speed-ups with this by calling
instead of letting Python to exit in the usual fashion where all the
destructors and other functions that are called at exit:
import afl, fuzzable_module, os afl.init() fuzzable_module.main() os._exit(0)
Previous examples assume that the input file generated by the fuzzer comes as the first parameter on the command line. This is quite a good assumption, as many data processing modules for Python include a command line interface where they read and process files given on the command line. But if we can directly call the data processing function, we can instead use the standard input to feed the data:
import afl, fuzzable_module, os, sys afl.init() fuzzable_module.process_data(sys.stdin) os._exit(0)
With Python 3 comes additional complexity. Python 3 processes the
standard input using the encoding specified in the environment. Often
in Unix environments it is UTF-8. As afl-fuzz mostly does bit
manipulation, input is going to end up with broken UTF-8 data and
results in exception when reading from the standard input file
object. To work around this, you can use
sys.stdin.buffer instead of
sys.stdin in Python 3 based programs. Or create a shim that always
results in raw bytes:
import afl, fuzzable_module, os, sys try: # Python 3: stdin_compat = sys.stdin.buffer except AttributeError: # There is no buffer attribute in Python 2: stdin_compat = sys.stdin afl.init() fuzzable_module.process_data(stdin_compat) os._exit(0)
The fastest persistent mode requires that the program should not have a global state where the previous program execution affects the next one. There unfortunately is a surprising amount of global state in Python programs. It is not that uncommon to initialize some specific variables only during the program execution and then re-use the results later. This usually is harmless, but negatively affects the program stability that afl-fuzz is going to show in its status screen.
Persistent mode code for a fuzzable program could look like following including the learnings from the deferred instrumentation and its speedups:
import afl, fuzzable_module, os try: # Python 3: stdin_compat = sys.stdin.buffer except AttributeError: # There is no buffer attribute in Python 2: stdin_compat = sys.stdin while afl.loop(10000): fuzzable_module.process_data(stdin_compat) os._exit(0)
Benchmarking different afl-fuzz modes
I wanted to measure how these different afl-fuzz modes behave with Python. So I created a small fuzz target whose main algorithm does some conditional computation based on the input data and prints out the result:
def fuzz_one(stdin, values): data = stdin.read(128) total = 0 for key in data: # This only includes lowercase ASCII letters: if key not in values: continue value = values[key] if value % 5 == 0: total += value * 5 total += ord(key) elif value % 3 == 0: total += value * 3 total += ord(key) elif value % 2 == 0: total += value * 2 total += ord(key) else: total += value + ord(key) print(total)
This is just to exercise the fuzzer a little bit more than a trivial function that does nothing would do. I also created an equivalent fuzz target in C++ to give numbers to compare what kind of an overhead different fuzzing modes incur for both Python and native applications. The approximate results are summarized in table 1. The actual scripts used to generate this data are available from following links: measure-times.sh, target-simple.template.py, and target-simple.cpp.
|Python 2||Python 3||Native|
What these results show, it is possible to make fuzzable program in Python in such way that the fuzzing start-up overhead is only three to four times larger than for a native one with afl-fuzz. This is an excellent result considering that generally algorithms implemented in Python can be considered 10-100 times slower than ones implemented in C family languages. But if you want to use Python for performance critical tasks, you are anyways using Cython or write performance critical parts in C.
There is a clear performance difference between Python 2.7.14 and Python 3.6.4 especially when all start-up and exit optimization tricks are not in use. This difference is also visible in Python start-up benchmarks at speed.python.org. This difference gets smaller when the persistent mode is used as Python executable is not shut down immediately after processing one input. What can also help Python 3 in the persistent fuzzing mode is the fact that the tracing function that python-afl sets with sys.settrace() is called only half as often with Python 3 is it is called with Python 2 for this fuzz target.
More speed for repeated Python executions
Python enables imports and other type of code loading at any phase of the program execution. This makes it quite possible that program has not fully loaded before it is executed for the first time. This is usually used as a configuration mechanism to configure the program based on the runtime environment and other type of configuration. It also provides the possibility to write plugins, similarly to what dynamically loaded libraries provide.
You can see from table 1 how the fuzzing overhead
decreases a lot when the fuzzing start point moved after
statements for this simple example program. So when I was fuzzing an
old version of flake8, I tried
to see if it had any hidden configuration statements that would
execute and cache their results for repeated calls. And it did!
Initially I used following type of fuzzing wrapper for flake8:
import afl, sys, flake8.run afl.init() flake8.run.check_code(sys.stdin.read())
It is basically a simple wrapper that imports all what it needs and
then fuzzes what is fed from the standard input to the program. But
the performance of this was horrible, around 15 executions/second. So
I tried to see what happens when I change the code a little bit by
flake8.run.check_code() function with an empty string
before setting the fuzzing starting point:
import afl, sys, flake8.run # Make sure that the hidden runtime configuration is executed: flake8.run.check_code("") afl.init() flake8.run.check_code(sys.stdin.read())
This doubled the execution speed to around 30 executions/second. It is still quite slow for the small inputs that afl-fuzz initially creates, but an improvement nonetheless. I looked at what flake8 does when it is executed and a following line popped up:
from pkg_resources import iter_entry_points
There basically is a hidden
import statement in the execution path
whose result is cached after the first time it is encountered. Also
pkg_resources.iter_entry_points() function is used to
configure the program at runtime
and that also adds some extra overhead to the process.
Flake8 also by default tries to execute checks in parallel with
multiprocessing module. This
might be a good idea when you have multiple files to verify at once,
but during fuzzing it just adds unneeded overhead. Also the fact that
it starts a new process makes the fuzzer lose all information about
what is happening in the subprocess. Fortunately in flake8 it was
possible to override the detected multiprocessing support by just
setting one variable into
False and then flake8
will act as there is no multiprocessing support. This increased the
average fuzzing speed of flake8 by threefold.
The final speedup with flake8 came when looking at how flake8 is constructed. It is basically a wrapper around mccabe, pycodestyle, and pyflakes packages. So rather than fuzz flake8, it is much more productive to create a fuzz target for each one of those packages individually. I did this for pycodestyle and ended up finally executing it with around 420 executions/second for trivial data and around 200 executions/second for more realistic. So basic recommendations on how to fuzz native programs also apply for Python.
Monkey patching around fuzzing barriers
As american fuzzy lop does not know almost anything about the input, it can encounter various impediments (see section 13 from afl’s README.txt) when the file format includes any values that depend on the previously encountered data. This is especially problematic with checksums and is also an issue with other mutation based fuzzers. To work around this the C world, you can generally use C preprocessor to turn off checksum verification when such thing is encountered. This also applies for all other types of barriers that might skew fuzzing results, like random number usage.
Unfortunately Python does not have preprocessor support by default, so
this type of conditional compiling is out of the question. Fortunately
Python provides the possibility to do
monkey patching where
you can replace functions or methods at runtime. So to make a library
more fuzzer friendly, you can monkey patch all functions related to
data verification to always return
True, or return some constant
value when checksums are considered.
I used this approach to fuzz python-evxt 0.6.1 library. Python-evxt is a library to parse Windows event log files and is one of the first hits when you search Python Package Index with “pure python parser” keyword. The file format includes CRC32 checksum that will prevent the fuzzer from being able to meaningfully modify the input file, as almost all modifications will create an incorrect checksum in the file.
To monkey patch around this issue, I searched the source code for all functions that potentially have anything to do with checksum generation and made them always to return a constant value:
import Evtx.Evtx checksum_patches = ( (Evtx.Evtx.ChunkHeader, "calculate_header_checksum"), (Evtx.Evtx.ChunkHeader, "header_checksum"), (Evtx.Evtx.ChunkHeader, "calculate_data_checksum"), (Evtx.Evtx.ChunkHeader, "data_checksum"), (Evtx.Evtx.FileHeader, "checksum"), (Evtx.Evtx.FileHeader, "calculate_checksum"), ) for class_obj, method_name in checksum_patches: setattr(class_obj, method_name, lambda *args, **kw: 0)
checksum_patches variable holds all the
functions that you need to overwrite to ignore checksums. You can use
setattr() to overwrite these methods on class level with an
lambda *args, **kw: 0 that
0 and takes any arguments. Taking any number of
arguments in any fashion is enabled with
**kw and this
syntax is explained in
section on Python’s control flow tools manual.
I was impressed on how low fuzzing overhead Python programs can have when compared to C family languages. When looking at how python-afl is implemented, it becomes quite clear that the deferred mode of afl-fuzz has a great part in this where the whole Python environment and program initialization is skipped between different fuzzing runs.
Making a Python module fuzzable was also more easy than in C family
languages. Although american fuzzy lop is already the easiest fuzzing
engine to get started with, the fact that it uses clang to do the more
advanced tricks often gives headaches when trying to get software to
compile. The fact that I did not have to use any modified Python
interpreter to get started but only had to
import afl module made me
to realize how many steps I skipped that I normally have to do when
using american fuzzy lop on new systems.
Thanks to Python’s dynamic execution and monkey patching capabilities
I could also try out fuzz target creation with external libraries
without having to actually modify the original code. Especially
selecting some specific functions to fuzz and overriding checksum
generation would generally require nasty software patches with C
family languages. Especially if there is
main() function to
It was also nice to realize that the
equivalent in standard C library function call
_Exit(0) could help make light native fuzz targets even
faster. The process cleanup in this relatively trivial C++ program
adds almost 30% overhead for repeated program execution. Although it
will likely break any
sanitizer that does any
verification work during the exit, like searching for dynamically
allocated memory that was not freed.