Fuzzing bitcode reader

Hi all,

The blog entry [1] suggest that one of the buildbots constantly fuzzes
clang and clang-format. However, the actual bot [2] only tests the
fuzzer itself over a well-known set of bugs in standard software (eg.
Heartbleed [3] seems to be among them). Has there actually ever been a
buildbot that fuzzes clang/LLVM itself?

Another (obvious?) fuzzing candidate would be the LLVM's bitcode
reader. I ran afl-fuzz on it and it found lots of failed assertions
within seconds. Isn't fuzzing done on a regular basis as [1] suggests
should be done? Should I report the crashes found by it?

Michael

[1] Simple guided fuzzing for libraries using LLVM's new libFuzzer - The LLVM Project Blog
[2] http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer/
[3] fuzzer-test-suite/openssl-1.0.1f at master · google/fuzzer-test-suite · GitHub

Hi all,

The blog entry [1] suggest that one of the buildbots constantly fuzzes
clang and clang-format. However, the actual bot [2] only tests the
fuzzer itself over a well-known set of bugs in standard software (eg.
Heartbleed [3] seems to be among them).

Isn’t it this stage? http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer/builds/2755/steps/stage2%2Fasan%2Bassertions%20check-fuzzer/logs/stdio

Has there actually ever been a
buildbot that fuzzes clang/LLVM itself?

Another (obvious?) fuzzing candidate would be the LLVM's bitcode
reader. I ran afl-fuzz on it and it found lots of failed assertions
within seconds. Isn't fuzzing done on a regular basis as [1] suggests
should be done? Should I report the crashes found by it?

The bitcode reader is known to not be robust against malformed inputs.

To me it looks like just the compilation and the unit+regression tests
("ninja check-fuzzer", not even depending on clang). It also completes
in only 10 minutes, which is not a lot for fuzzing.

Michael

Yes, I believe you’re right!

>
> Hi all,
>
> The blog entry [1] suggest that one of the buildbots constantly fuzzes
> clang and clang-format. However, the actual bot [2] only tests the
> fuzzer itself over a well-known set of bugs in standard software (eg.
> Heartbleed [3] seems to be among them).

Isn’t it this stage? http://lab.llvm.org:8011/build
ers/sanitizer-x86_64-linux-fuzzer/builds/2755/steps/
stage2%2Fasan%2Bassertions%20check-fuzzer/logs/stdio

> Has there actually ever been a
> buildbot that fuzzes clang/LLVM itself?

Yes, I used to run clang-fuzzer and clang-format-fuzzer on this bot, but
not any more.
The reason is simple -- the bot was always red (well, orange) and the bugs
were never fixed.

Currently we run clang-fuzzer (but not clang-format-fuzzer) on our internal
fuzzing infra
and Richard has fixed at least one bug found this way.
http://llvm.org/viewvc/llvm-project?view=revision&revision=291030

My llvm fuzzing bot was pretty naive and simple.
If we want proper continuous fuzzing for parts of LLVM we either need to
build a separate "real" continuous fuzzing process,
or use an existing one. Luckily, there is one :slight_smile:
As a pilot I've recently added the cxa_demangler_fuzzer to OSS-Fuzz
<https://github.com/google/oss-fuzz&gt;:
https://github.com/google/oss-fuzz/tree/master/projects/llvm_libcxxabi
It even found one bug which Mehdi already fixed!
http://llvm.org/viewvc/llvm-project?view=revision&revision=293330
The bug report itself will become public in ~4 days:
370 - oss-fuzz - OSS-Fuzz: Fuzzing the planet - Monorail

If we want to run some more llvm fuzzers on OSS-Fuzz I'd be happy to (help)
set them up.

>
> Another (obvious?) fuzzing candidate would be the LLVM's bitcode
> reader. I ran afl-fuzz on it and it found lots of failed assertions
> within seconds. Isn't fuzzing done on a regular basis as [1] suggests
> should be done? Should I report the crashes found by it?

The bitcode reader is known to not be robust against malformed inputs.

Yes, I afraid the bitcode reader (as some other parts of LLVM) are not
robust enough to withstand fuzzing. :frowning:
Note that if we want to use libFuzzer (which is an in-process fuzzer) the
target should not assert/abort/exit on any input (if it's not a bug).

--kcc

>
>>
>>>
>>> Hi all,
>>>
>>> The blog entry [1] suggest that one of the buildbots constantly fuzzes
>>> clang and clang-format. However, the actual bot [2] only tests the
>>> fuzzer itself over a well-known set of bugs in standard software (eg.
>>> Heartbleed [3] seems to be among them).
>>
>> Isn’t it this stage? http://lab.llvm.org:8011/
builders/sanitizer-x86_64-linux-fuzzer/builds/2755/steps/stage2%2Fasan%
2Bassertions%20check-fuzzer/logs/stdio
>
> To me it looks like just the compilation and the unit+regression tests
> ("ninja check-fuzzer", not even depending on clang). It also completes
> in only 10 minutes, which is not a lot for fuzzing.

Yes, I believe you’re right!

Right now lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer
tests 'check-fuzzer' which is a regression test suite for libFuzzer (set of
synthetic puzzles)
and also runs some of the fuzzing benchmarks from
https://github.com/google/fuzzer-test-suite/
It does not fuzz anything from LLVM any more.

Yes, I used to run clang-fuzzer and clang-format-fuzzer on this bot, but not
any more.
The reason is simple -- the bot was always red (well, orange) and the bugs
were never fixed.

Currently we run clang-fuzzer (but not clang-format-fuzzer) on our internal
fuzzing infra
and Richard has fixed at least one bug found this way.
http://llvm.org/viewvc/llvm-project?view=revision&revision=291030

My llvm fuzzing bot was pretty naive and simple.
If we want proper continuous fuzzing for parts of LLVM we either need to
build a separate "real" continuous fuzzing process,
or use an existing one. Luckily, there is one :slight_smile:
As a pilot I've recently added the cxa_demangler_fuzzer to OSS-Fuzz:
https://github.com/google/oss-fuzz/tree/master/projects/llvm_libcxxabi
It even found one bug which Mehdi already fixed!
http://llvm.org/viewvc/llvm-project?view=revision&revision=293330
The bug report itself will become public in ~4 days:
370 - oss-fuzz - OSS-Fuzz: Fuzzing the planet - Monorail

Thanks for the explanation.

> Another (obvious?) fuzzing candidate would be the LLVM's bitcode
> reader. I ran afl-fuzz on it and it found lots of failed assertions
> within seconds. Isn't fuzzing done on a regular basis as [1] suggests
> should be done? Should I report the crashes found by it?

The bitcode reader is known to not be robust against malformed inputs.

Yes, I afraid the bitcode reader (as some other parts of LLVM) are not
robust enough to withstand fuzzing. :frowning:
Note that if we want to use libFuzzer (which is an in-process fuzzer) the
target should not assert/abort/exit on any input (if it's not a bug).

Is there any incentive to change that? A google Summer of Code project maybe?

Michael

> Yes, I used to run clang-fuzzer and clang-format-fuzzer on this bot, but
not
> any more.
> The reason is simple -- the bot was always red (well, orange) and the
bugs
> were never fixed.
>
> Currently we run clang-fuzzer (but not clang-format-fuzzer) on our
internal
> fuzzing infra
> and Richard has fixed at least one bug found this way.
> http://llvm.org/viewvc/llvm-project?view=revision&revision=291030
>
> My llvm fuzzing bot was pretty naive and simple.
> If we want proper continuous fuzzing for parts of LLVM we either need to
> build a separate "real" continuous fuzzing process,
> or use an existing one. Luckily, there is one :slight_smile:
> As a pilot I've recently added the cxa_demangler_fuzzer to OSS-Fuzz:
> https://github.com/google/oss-fuzz/tree/master/projects/llvm_libcxxabi
> It even found one bug which Mehdi already fixed!
> http://llvm.org/viewvc/llvm-project?view=revision&revision=293330
> The bug report itself will become public in ~4 days:
> 370 - oss-fuzz - OSS-Fuzz: Fuzzing the planet - Monorail

Thanks for the explanation.

>> > Another (obvious?) fuzzing candidate would be the LLVM's bitcode
>> > reader. I ran afl-fuzz on it and it found lots of failed assertions
>> > within seconds. Isn't fuzzing done on a regular basis as [1] suggests
>> > should be done? Should I report the crashes found by it?
>>
>> The bitcode reader is known to not be robust against malformed inputs.
>
>
> Yes, I afraid the bitcode reader (as some other parts of LLVM) are not
> robust enough to withstand fuzzing. :frowning:
> Note that if we want to use libFuzzer (which is an in-process fuzzer) the
> target should not assert/abort/exit on any input (if it's not a bug).

Is there any incentive to change that?

Not that I know of.

A google Summer of Code project maybe?

Maybe.
The bottleneck is not bug finding, but bug fixing, which sometimes may
require large changes.
And doing code review for such changes might be more work than just making
them.

--kcc

For the bitcode for example, I wouldn’t expect it to be large changes that would be complicated to review. However these are still tedious bugs to fix.
About a GSOC, my own personal opinion is that we should try to give interesting / fun projects to student and not use them as cheap labor to fix the small bugs and issues we’re not able to prioritize ourself.

My 2 cents :slight_smile:

> Yes, I used to run clang-fuzzer and clang-format-fuzzer on this bot,
but not
> any more.
> The reason is simple -- the bot was always red (well, orange) and the
bugs
> were never fixed.
>
> Currently we run clang-fuzzer (but not clang-format-fuzzer) on our
internal
> fuzzing infra
> and Richard has fixed at least one bug found this way.
> http://llvm.org/viewvc/llvm-project?view=revision&revision=291030
>
> My llvm fuzzing bot was pretty naive and simple.
> If we want proper continuous fuzzing for parts of LLVM we either need to
> build a separate "real" continuous fuzzing process,
> or use an existing one. Luckily, there is one :slight_smile:
> As a pilot I've recently added the cxa_demangler_fuzzer to OSS-Fuzz:
> https://github.com/google/oss-fuzz/tree/master/projects/llvm_libcxxabi
> It even found one bug which Mehdi already fixed!
> http://llvm.org/viewvc/llvm-project?view=revision&revision=293330
> The bug report itself will become public in ~4 days:
> 370 - oss-fuzz - OSS-Fuzz: Fuzzing the planet - Monorail

Thanks for the explanation.

>> > Another (obvious?) fuzzing candidate would be the LLVM's bitcode
>> > reader. I ran afl-fuzz on it and it found lots of failed assertions
>> > within seconds. Isn't fuzzing done on a regular basis as [1] suggests
>> > should be done? Should I report the crashes found by it?
>>
>> The bitcode reader is known to not be robust against malformed inputs.
>
>
> Yes, I afraid the bitcode reader (as some other parts of LLVM) are not
> robust enough to withstand fuzzing. :frowning:
> Note that if we want to use libFuzzer (which is an in-process fuzzer)
the
> target should not assert/abort/exit on any input (if it's not a bug).

Is there any incentive to change that?

Not that I know of.

A google Summer of Code project maybe?

Maybe.
The bottleneck is not bug finding, but bug fixing, which sometimes may
require large changes.
And doing code review for such changes might be more work than just making
them.

For the bitcode for example, I wouldn’t expect it to be large changes that
would be complicated to review. However these are still tedious bugs to fix.
About a GSOC, my own personal opinion is that we should try to give
interesting / fun projects to student and not use them as cheap labor to
fix the small bugs and issues we’re not able to prioritize ourself.

I got started on LLVM in college working on "small bugs and issues we’re
not able to prioritize ourself" (e.g. refactoring TableGen). Yes, it's not
flashy, but people in the community do appreciate it. Also, IMO most of the
"hard part" of learning to work on LLVM for real (and OSS in general) is
learning the development workflow, interacting with the community, etc. and
for that, small fixes are actually just as good (if not better) preparation
than working on some flashy thing.

My experience is that most of the work (in terms of time) to be done on
real software projects is bug fixes and small issues (i.e. maintenance), so
it's good to be comfortable doing that and treating it as a normal thing
rather than a "chore"; and you can only get that kind of experience working
on a real project that has maintenance to do :slight_smile:

-- Sean Silva

I totally agree with all of this, and such bugs (fuzzer found crashes, or even writing new fuzzer tests) should be listed in a getting started pages (CC Vassil) because that’s a good way to start.

My previous answer was only in the context of GSOC project.

I'd see this from a different perspective. The GSoC says its focus is
"bringing more student developers into open source software
development" [1]. For such a goal maintenance work is more purposeful
than an interesting side-project whose future is uncertain after the
program ends. At least more interaction with the community and the
code base is assured. Moreover, the student gets paid and is free to
not apply for this kind of work; a luxury we employees usually do not
have.

Michael

[1] https://summerofcode.withgoogle.com/

> About a GSOC, my own personal opinion is that we should try to give
> interesting / fun projects to student and not use them as cheap labor to
fix
> the small bugs and issues we’re not able to prioritize ourself.

I'd see this from a different perspective. The GSoC says its focus is
"bringing more student developers into open source software
development" [1]. For such a goal maintenance work is more purposeful
than an interesting side-project whose future is uncertain after the
program ends. At least more interaction with the community and the
code base is assured. Moreover, the student gets paid and is free to
not apply for this kind of work; a luxury we employees usually do not
have.

We are trying to do similar fuzzing work in FreeType with GSoC:
Ideas for Google Summer of Code (the page will have more details soon)

So GSoC *may* work for LLVM fuzzing as well if we have interested hosts.
I'd be happy to be a co-host on the fuzzing side, but someone else
will need to be a co-host on the side of bit-code reader (or whatever else
we want to fuzz in LLVM)

a side note: I hope to get to structured fuzzing for some parts of LLVM
(firstly, clang)
using protobufs (GitHub - google/libprotobuf-mutator: Library for structured fuzzing with protobuffers).
But there is enough interesting work to give to a student in summer (again,
if there will be interested co-hosts)

--kcc