Proposal for the inclusion of LLCov code

Hello all,

below is a proposal to include LLCov, a simple but helpful little tool
based on LLVM/Clang, into the main LLVM code.

Tl;dr: It's a module pass that instruments basic blocks with calls to an
external function, and it can be used for various things, including
(live!) basic block coverage.

I'm looking forward to hear opinions on this :slight_smile:



=== Problem description ===

Code coverage always has been considered an important aspect in testing.
Especially for automated testing (e.g. fuzzing), coverage is a
requirement for success. Some recent fuzzing research is going into the
direction of genetic algorithms where coverage can be a part of the
fitness function.
However, applying this all to a large codebase in a practical way is a
complex endeavor. Popular code coverage tools like GCov are not exactly
designed to be used to obtain coverage while the program is running.
Since we want to make decisions based on coverage without terminating
the program though (mainly for performance reasons but depending on the
type of fuzzing also because one would like to alter the mutation
strategy mid-fuzzing), we need to get coverage feedback live when it
happens. Furthermore, we are often not interested in all of the
coverage. Often, a particular portion of the code is targeted and the
rest (which is the majority) would only slow us down if instrumented.

=== Proposed solution ===
I propose to include LLCov into the main LLVM tree. LLCov is implemented
as a module pass and allows to selectively instrument code portions for
basic block coverage measurement (or any other task that should be
performed per basic block). It can instrument based on a combination of
black- and whitelist that works based on files, lines or functions. All
of the instrumented code calls an arbitrary external function per basic
block (that is, per control flow node). This external function can do
whatever the tester wants it to do. The simplest task would be to output
coverage information on stderr and have the fuzzer collect it there. It
could also provide the information over a network socket though.

=== Current status of the tool ===
The current LLCov code is maintained at GitHub - choller/LLCov: LLVM-based, flexible, live coverage instrumentation for C/C++ code
and consists of the main LLCov.cpp file, implementing the module pass,
as well as two patches (one integrating the LLVM pass, the other
patching the Clang frontend to support the necessary compiler flag and
to link the runtime). Over the time, the module pass itself only
required little adjustment (e.g. some includes changed), but rebasing
the patches for the frontend typically required manual work.

=== Alternatives ===
One alternative would be to add an interface such that the changes
required to integrate this and other passes (especially into the Clang
frontend), can be made dynamically. I'm not sure if this is possible though.
Another alternative would be to add this functionality to the GCov pass,
but I am not sure if that is easily doable given the way GCov typically

[I missed this message, thanks glider to pointing me to it]

Do you want a general mechanism or just coverage?
The general mechanism could make sense for general things (just like gcc’s -finstrument-functions, but at BB level),
but it is less suitable for coverage for performance reasons – calling a function per every block is expensive.

If you want fuzzing with genetic algorithms take a look here:

We are already doing this kind of things with asan :slight_smile:
And asan already supports blacklists.