libfuzzer questions

First off, thanks – this is a pretty great library and it feels like I’m learning a lot. I’m getting some more experience with libfuzzer and finding that I have a couple of questions:

  • How does libfuzzer decide to write a new test file? What distinguishes this one from all the other cases for which new test inputs were not written? Must be something about the path taken through the code?

  • Can I use afl-cmin or is there something similar for libFuzzer? I find that sometimes I get an enormous amount of tests and it becomes unmanageable.

  • sometimes my process being tested appears to deadlock. A common feature seems to be that AlarmCallback is allocating memory and as a consequence the ASan code is pending on a lock. I’ll speculate that this is because the alarm expired while the lock was already held. Is this expected? I can share specific call stacks if it helps. I can just extend the timeout but I think it’s probably appropriate.

  • AFL has a curses based display where a bunch of different stats are shown. I’ll be honest, I don’t know how to read those yet. :wink: But I’d like to find some way to determine whether I’m seeing diminishing returns with libfuzzer. Is there a good strategy?

  • Can anyone share tips for how libFuzzer has been used with some success – anything beyond what’s already available in http://llvm.org/docs/LibFuzzer.html ?

+Kostya, Fuzzer of Sanity

First off, thanks -- this is a pretty great library and it feels like I'm
learning a lot.

Thanks!

I'm getting some more experience with libfuzzer and finding that I have a
couple of questions:

- How does libfuzzer decide to write a new test file? What distinguishes
this one from all the other cases for which new test inputs were not
written? Must be something about the path taken through the code?

Exactly.
It uses SanitizerCoverage — Clang 18.0.0git documentation to figure out if
any new edge in the control flow graph has been discovered with the given
input.

- Can I use afl-cmin or is there something similar for libFuzzer?

I've never tried that. I'd expect you can.
libFuzzer and afl both use plain files to store the corpus.

I find that sometimes I get an enormous amount of tests and it becomes

unmanageable.

libFuzzer has an option to minimize the corpus.
It's not perfect, but very simple.

First off, thanks -- this is a pretty great library and it feels like I'm
learning a lot.

Thanks!

I'm getting some more experience with libfuzzer and finding that I have a
couple of questions:

- How does libfuzzer decide to write a new test file? What distinguishes
this one from all the other cases for which new test inputs were not
written? Must be something about the path taken through the code?

Exactly.
It uses SanitizerCoverage — Clang 18.0.0git documentation to figure out
if any new edge in the control flow graph has been discovered with the
given input.

So if I'm seeing tens of thousands of distinct test files, that represents
tens of thousands of distinct edges? Does the CFG span functions/methods
or are they scoped more sanely?

- Can I use afl-cmin or is there something similar for libFuzzer?

I've never tried that. I'd expect you can.
libFuzzer and afl both use plain files to store the corpus.

I think afl-cmin uses some afl-specific behavior.

I find that sometimes I get an enormous amount of tests and it becomes

unmanageable.

libFuzzer has an option to minimize the corpus.
It's not perfect, but very simple.
-------------
save_minimized_corpus 0 If 1, the minimized corpus is
saved into the first input directory
-------------

Ohh, ok. I think I misunderstood this to trying to minimize the size of
the test case while still reproducing a crash. Similar to how afl-tmin
works, I was thinking. I'll give this a try.

Should I only use this option periodically or can I run it this way all the
time? Do we end up spending more execution time minimizing the corpus?
Will it delete redundant test cases, including ones that were there before
this test run started?

- sometimes my process being tested appears to deadlock. A common
feature seems to be that AlarmCallback is allocating memory and as a
consequence the ASan code is pending on a lock. I'll speculate that this
is because the alarm expired while the lock was already held. Is this
expected? I can share specific call stacks if it helps. I can just extend
the timeout but I think it's probably appropriate.

Yes, please give more details.

Traces attached. Not sure if the mailing list will preserve the
attachments, though.

- AFL has a curses based display where a bunch of different stats are
shown. I'll be honest, I don't know how to read those yet. :wink: But I'd
like to find some way to determine whether I'm seeing diminishing returns
with libfuzzer. Is there a good strategy?

libFuzzer just dumps stats to stderr.
As long as you periodically see lines like
#325 NEW cov 11985 bits 14108 units 113 exec/s 325 ...
you are good.

Once you stop getting those, you may start playing with the flags.
(e.g. increase the max_len).
Unlike AFL which knows it all, libFuzzer still relies on a bit of user
help. :slight_smile:

Ok, that's good advice.

trace.txt (6.59 KB)

trace2.txt (6.58 KB)

trace3.txt (13 KB)

First off, thanks -- this is a pretty great library and it feels like
I'm learning a lot.

Thanks!

I'm getting some more experience with libfuzzer and finding that I have
a couple of questions:

- How does libfuzzer decide to write a new test file? What
distinguishes this one from all the other cases for which new test inputs
were not written? Must be something about the path taken through the code?

Exactly.
It uses SanitizerCoverage — Clang 18.0.0git documentation to figure out
if any new edge in the control flow graph has been discovered with the
given input.

So if I'm seeing tens of thousands of distinct test files, that represents
tens of thousands of distinct edges?

In the extreme case -- yes.
However usually a single file covers more than one unique edge.
Also, if you are running the fuzzer in parallel (-jobs=N) some edges can be
discovered many times.

Does the CFG span functions/methods or are they scoped more sanely?

Hm? What do you mean?
An control flow edge is a regular edge between basic blocks in a function.
With -fsanitize-coverage=indirect-calls it will also track indir call edges
(uniq pairs of caller-callee).

- Can I use afl-cmin or is there something similar for libFuzzer?

I've never tried that. I'd expect you can.
libFuzzer and afl both use plain files to store the corpus.

I think afl-cmin uses some afl-specific behavior.

I find that sometimes I get an enormous amount of tests and it becomes

unmanageable.

libFuzzer has an option to minimize the corpus.
It's not perfect, but very simple.
-------------
save_minimized_corpus 0 If 1, the minimized corpus is
saved into the first input directory
-------------

Ohh, ok. I think I misunderstood this to trying to minimize the size of
the test case while still reproducing a crash. Similar to how afl-tmin
works, I was thinking. I'll give this a try.

Should I only use this option periodically or can I run it this way all
the time? Do we end up spending more execution time minimizing the
corpus? Will it delete redundant test cases, including ones that were
there before this test run started?

You should only use this option if you want to store the minimized corpus
somewhere,
or if the initial stage (between "#0 READ" and "#1331 INITED")
takes too long.
Otherwise you should not bother since libFuzzer minimizes the corpus in
memory on every run.
(minimization is done with a trivial greedy algorithm, not even close to
really minimal solution, but good enough).
The output looks like this:

#0 READ cov 0 bits 0 units 1331 exec/s 0
...
#1024 pulse cov 8043 bits 13474 units 1331 exec/s 256
#1331 INITED cov 8050 bits 13689 units 594 exec/s 221
#2048 pulse cov 8050 bits 13689 units 594 exec/s 341

This means that the corpus on disk had 1331 units, they were read,
shuffled, executed, and those that added coverage were chosen.

- sometimes my process being tested appears to deadlock. A common
feature seems to be that AlarmCallback is allocating memory and as a
consequence the ASan code is pending on a lock. I'll speculate that this
is because the alarm expired while the lock was already held. Is this
expected? I can share specific call stacks if it helps. I can just extend
the timeout but I think it's probably appropriate.

Yes, please give more details.

Traces attached. Not sure if the mailing list will preserve the
attachments, though.

Aha, of course.
I run non-async-signal-safe code in the signal handler, bummer.
Let me try to fix this (no promises for a quick fix, I'll be out for a
while).

...

So if I'm seeing tens of thousands of distinct test files, that
represents tens of thousands of distinct edges?

In the extreme case -- yes.
However usually a single file covers more than one unique edge.
Also, if you are running the fuzzer in parallel (-jobs=N) some edges can
be discovered many times.
...

With -fsanitize-coverage=indirect-calls it will also track indir call
edges (uniq pairs of caller-callee).

Ok, I think the parallel jobs and unique caller/callee pairs must be where
it got amped up a bit. I'm using "bb,indirect-calls,8bit-counters".

save_minimized_corpus 0 If 1, the minimized corpus is

saved into the first input directory
-------------

Ohh, ok. I think I misunderstood this to trying to minimize the size of
the test case while still reproducing a crash. Similar to how afl-tmin
works, I was thinking. I'll give this a try.

Should I only use this option periodically or can I run it this way all
the time? Do we end up spending more execution time minimizing the
corpus? Will it delete redundant test cases, including ones that were
there before this test run started?

You should only use this option if you want to store the minimized corpus
somewhere,

Ok, so I can do it periodically to prune the test corpus. That's great.

or if the initial stage (between "#0 READ" and "#1331 INITED")
takes too long.
Otherwise you should not bother since libFuzzer minimizes the corpus in
memory on every run.
(minimization is done with a trivial greedy algorithm, not even close to
really minimal solution, but good enough).
The output looks like this:

#0 READ cov 0 bits 0 units 1331 exec/s 0
...
#1024 pulse cov 8043 bits 13474 units 1331 exec/s 256
#1331 INITED cov 8050 bits 13689 units 594 exec/s 221
#2048 pulse cov 8050 bits 13689 units 594 exec/s 341

This means that the corpus on disk had 1331 units, they were read,
shuffled, executed, and those that added coverage were chosen.

Hah! this means I've been misreading this line all along. My eyes zoomed
in on "N exec/s" and I assumed that was the throughput (and I just ignored
the "suffix" entry). So it's "cov X" / "bits Y" / "units Z" / "exec/s W"
? I guess the PCRE2 use case in the docs explains some of this -- I
should've read this more closely.

...
Aha, of course.

I run non-async-signal-safe code in the signal handler, bummer.
Let me try to fix this (no promises for a quick fix, I'll be out for a
while).

Ok, no problem. If you don't get around to it I can probably come up w/a
fix.

Traces attached. Not sure if the mailing list will preserve the
attachments, though.

Aha, of course.
I run non-async-signal-safe code in the signal handler, bummer.
Let me try to fix this (no promises for a quick fix, I'll be out for a
while).

I couldn't easily reproduce this behavior on a simple test.
Do you always see fuzzer::Fuzzer::WriteToCrash in the stack?
If yes, r244707 *may* help.
There are other things that may potentially cause similar deadlocks --
please send me more stack traces if you see these again.

Or, send the repro instructions if they are reasonably simple.

...

So if I'm seeing tens of thousands of distinct test files, that
represents tens of thousands of distinct edges?

In the extreme case -- yes.
However usually a single file covers more than one unique edge.
Also, if you are running the fuzzer in parallel (-jobs=N) some edges can
be discovered many times.
...

With -fsanitize-coverage=indirect-calls it will also track indir call
edges (uniq pairs of caller-callee).

Ok, I think the parallel jobs and unique caller/callee pairs must be where
it got amped up a bit. I'm using "bb,indirect-calls,8bit-counters".

With 8bit-counters you may get up to 8 test cases for every edge.

save_minimized_corpus 0 If 1, the minimized corpus is

saved into the first input directory
-------------

Ohh, ok. I think I misunderstood this to trying to minimize the size of
the test case while still reproducing a crash. Similar to how afl-tmin
works, I was thinking. I'll give this a try.

Should I only use this option periodically or can I run it this way all
the time? Do we end up spending more execution time minimizing the
corpus? Will it delete redundant test cases, including ones that were
there before this test run started?

You should only use this option if you want to store the minimized corpus
somewhere,

Ok, so I can do it periodically to prune the test corpus. That's great.

or if the initial stage (between "#0 READ" and "#1331 INITED")
takes too long.
Otherwise you should not bother since libFuzzer minimizes the corpus in
memory on every run.
(minimization is done with a trivial greedy algorithm, not even close to
really minimal solution, but good enough).
The output looks like this:

#0 READ cov 0 bits 0 units 1331 exec/s 0
...
#1024 pulse cov 8043 bits 13474 units 1331 exec/s 256
#1331 INITED cov 8050 bits 13689 units 594 exec/s 221
#2048 pulse cov 8050 bits 13689 units 594 exec/s 341

This means that the corpus on disk had 1331 units, they were read,
shuffled, executed, and those that added coverage were chosen.

Hah! this means I've been misreading this line all along. My eyes zoomed
in on "N exec/s" and I assumed that was the throughput (and I just ignored
the "suffix" entry). So it's "cov X" / "bits Y" / "units Z" / "exec/s W"
? I guess the PCRE2 use case in the docs explains some of this -- I
should've read this more closely.

Yep.
Hm... Maybe I should put ":" there, like
#2048 pulse cov: 8050 bits: 13689 units: 594 exec/s: 341
??

Kostya Serebryany via llvm-dev <llvm-dev@lists.llvm.org> writes:

From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org]
> Hm... Maybe I should put ":" there, like
> #2048 pulse cov: 8050 bits: 13689 units: 594 exec/s: 341

Please do. The ":" makes this much clearer, IMHO. It might even be worth
it to throw some "," in there like

  #2048 pulse cov: 8050, bits: 13689, units: 594, exec/s: 341

I'd recommend a semicolon over a comma, since the latter might confuse European numeric scanners (thousands separator).

  #2048 pulse cov: 8050; bits: 13689; units: 594; exec/s: 341

- Chuck

> From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org]
> > Hm... Maybe I should put ":" there, like
> > #2048 pulse cov: 8050 bits: 13689 units: 594 exec/s: 341

> Please do. The ":" makes this much clearer, IMHO. It might even be worth
> it to throw some "," in there like

> #2048 pulse cov: 8050, bits: 13689, units: 594, exec/s: 341

I'd recommend a semicolon over a comma, since the latter might confuse
European numeric scanners (thousands separator).

I've added ":" yesterday.

Comma will add more stuff in there an complicate tiny awk-like scripts that
one may use to look at the logs.