Fuzzing clang test suite to generate crashing inputs

Hi,

I've been playing with afl-fuzz[1] to fuzz the clang test suite. In
the first 11 hours I have discovered 34 distinct assertion failures in
clang -std=c++11 and at least one segmentation fault (I didn't yet do
anything to tell different SEGVs apart), all on a recent HEAD. I
thought I'd share the methodology and the initial findings with you. I
haven't reported any bugs yet; feel free to do so if you are so
inclined. I might do that later.

For the 35 minimized test cases and their outputs, see [2], or
download a tarball from [3].

The git versions I'm running the tests against are:

  llvm 1610d6e Add missing FP build attribute tests.
  clang 64a12ad Driver: Objective-C should respect -fno-exceptions
  compiler-rt d6e5390 [ASan, LSan] Improve tracking of thread creation.

Some quick notes:

* These have been generated using afl-fuzz 0.85b using the clang test
  suite as input.

* The test suite has been first minimized using afl-fuzz's
  minimize_corpus.sh.

* Finding the first 35 crashes (34 distinct assertion failures and at
  least one segfault as I didn't yet do anything to tell them apart)
  took about 11 hours on a Core i7-2600.

* afl-fuzz is a directed fuzzing tool. When it discovers an input that
  exercises new edges in the binary, it adds it to the queue as a new
  input for fuzzing. After 11 hours the fuzzer is at most 0.05%
  through its queue. A small cluster would be nice for this. I suspect
  it would find new crashes after weeks of fuzzing on a binary of this
  complexity.

* If you wish to try it yourself, note the following:

  * Increase MAP_SIZE_POW2 in afl-fuzz's config.h.
    * 17 is not sufficient, 18 has looked good for the first 11 hours...

  * I built LLVM with -DLLVM_ENABLE_ASSERTIONS=ON
   -DLLVM_ENABLE_THREADS=OFF -DLLVM_ENABLE_BACKTRACES=OFF
   -DLLVM_ENABLE_CRASH_OVERRIDES=OFF (and of course using afl-clang as
    the compiler).

  * Running clang -cc1 directly with the input and with gives me about
    50-60 executions per second on each of the 8 instances (for a
    total of 400-480/s) instead of 5-6/s.

  * Use something like -ferror-limit 5 -Werror.
    * You might want to experiment with even smaller error limit.

  * Using ASAN might make sense too (afl supports it quite nicely, at
    least on 32-bit).

* I consider it hideous how well afl-fuzz works for a tool which does
  essentially sed-level magic on compiler-generated assembly to
  instrument the program.

  Sami

[1] http://lcamtuf.coredump.cx/afl/
[2] http://sliedes.kapsi.fi/clang-fuzz/
[3] http://sliedes.kapsi.fi/clang-fuzz.tar.gz

Hi,

I've been playing with afl-fuzz[1] to fuzz the clang test suite. In
the first 11 hours I have discovered 34 distinct assertion failures in
clang -std=c++11 and at least one segmentation fault (I didn't yet do
anything to tell different SEGVs apart), all on a recent HEAD. I
thought I'd share the methodology and the initial findings with you. I
haven't reported any bugs yet; feel free to do so if you are so
inclined. I might do that later.

For the 35 minimized test cases and their outputs, see [2], or
download a tarball from [3].

The git versions I'm running the tests against are:

  llvm 1610d6e Add missing FP build attribute tests.
  clang 64a12ad Driver: Objective-C should respect -fno-exceptions
  compiler-rt d6e5390 [ASan, LSan] Improve tracking of thread creation.

Some quick notes:

* These have been generated using afl-fuzz 0.85b using the clang test
  suite as input.

* The test suite has been first minimized using afl-fuzz's
  minimize_corpus.sh.

* Finding the first 35 crashes (34 distinct assertion failures and at
  least one segfault as I didn't yet do anything to tell them apart)
  took about 11 hours on a Core i7-2600.

* afl-fuzz is a directed fuzzing tool. When it discovers an input that
  exercises new edges in the binary, it adds it to the queue as a new
  input for fuzzing. After 11 hours the fuzzer is at most 0.05%
  through its queue. A small cluster would be nice for this. I suspect
  it would find new crashes after weeks of fuzzing on a binary of this
  complexity.

* If you wish to try it yourself, note the following:

  * Increase MAP_SIZE_POW2 in afl-fuzz's config.h.
    * 17 is not sufficient, 18 has looked good for the first 11 hours...

  * I built LLVM with -DLLVM_ENABLE_ASSERTIONS=ON
   -DLLVM_ENABLE_THREADS=OFF -DLLVM_ENABLE_BACKTRACES=OFF
   -DLLVM_ENABLE_CRASH_OVERRIDES=OFF (and of course using afl-clang as
    the compiler).

  * Running clang -cc1 directly with the input and with gives me about
    50-60 executions per second on each of the 8 instances (for a
    total of 400-480/s) instead of 5-6/s.

  * Use something like -ferror-limit 5 -Werror.
    * You might want to experiment with even smaller error limit.

  * Using ASAN might make sense too (afl supports it quite nicely, at
    least on 32-bit).

Yes, please!!! (And MSAN too)
-DLLVM_USE_SANITIZER=Address and (in a separate build)
-DLLVM_USE_SANITIZER=Memory

Fuzz testing is an interesting area, and afl-fuzz seems to be an interesting tool. Good points are that it focus on finding test cases that improves test (path) coverage, and its speed. Howver, its low-level alteration of input seems to mostly make it useful for finding errors in input handling of parsers/transcoders.

Of course, LLVM has many "parser" interfaces. I tried it on llvm-dis, llc, and opt with a minimal bitcode file (main that returns 0), and it found 100 paths (about 20 uniqe endpoints/assertions) with chrashes in 45 seconds (on a single core). Impressive! For my minmal bitcode file, the set of crashes seems to be very similar in llvm-dis, llc, and opt. (I had to use 'afl-fuzz -m 70', to increase the memory size, when starting afl-fuzz.)

For our fuzz testing (testing our internal backend), I think we will continue to only use Csmith and llvm-stress, together with option randomization, which find more "deep/semantic" bugs. However, afl-fuzz would perhaps be more interesting if it gained knowledge about the structure of input files. For example, like C-Reduce alters C files.

After 15 hours, afl-fuzz found 460 paths with crashes in llvm-dis (on the same minimal testcase as described above). Mostly SIGABORT, and a few SIGSEGV. See the attached list for the abort messages. Only a single line is retained for each abort (about 50 unique).

/Patrik Hägglund

llvm-dis-crashes.txt (10.7 KB)