Regarding fuzzing llvm-ir passes

Hi llvm people,

I have been contributing to clang for a while. I am now looking for something to work on in llvm-core.

In the list of open projects, I found llvm IR fuzzing to be interesting. I saw the gsoc page for llvm and browsed through the mailing list and it seems to me that no one else is actively working on it at the moment.

Is anyone else working on it right now? I am planning to start on the prerequisite readings once I get a better view on what’s going on in this area or whether I should pursue something else.

Many thanks,
Saurabh

Seems viable (+Kostya, maybe he can +anyone else on his team/he’s worked with who might be interesting in collaborating on this use of fuzzing, or provide other general pointers, etc)

A bit of prior work to be aware of:

There’s something running under OSSFuzz already. I’m not super clear on what this is, how it works operationally, but definitely something to be aware of.

llvm-stress is an in tree tool for generating random IR. Not sure this has been actively maintained at all though.

If you’re going to use a coverage guided fuzzer, you want to give some thought to your corpus choice. Will your corpus be IR? Bitcode? A random seed for llvm-stress? A random buffer replacing llvm-stress’ RNG? Each has tradeoffs and will exercise different parts of the infrastructure.

It’s also worth commenting that bugpoint’s reduction strategy tends to be a very effective mutation fuzzer in practice.

Personally, I’d approach it with something like the following:

  • Start with a corpus of random seeds to llvm-stress + a pass identifier. Should be easy to stand up and run with any fuzz driver, make sure it works and fix the obvious problems to get a reasonable fuzz rate.

  • Then extend your llvm-stress seed corpus into a random buffer corpus. Extract llvm-stress into a library which consumes a string of random bytes. Have the first byte of the buffer map to pass under test and the rest of an llvm-stress input.

  • Once that was running successfully - extend it. There’s lots of room to improve llvm-stress’ generator.

  • Another extension would be to add in mutation transforms after generation but before pass of interest. (Extracting out bugpoint/llvm-bisect transforms to use for the mutation would work pretty well.) Basically, you extend your input buffer to allow a set of transform identifies following the buffer passed to llvm-stress.

The preceding is not super well thought out, just what occurred to me in the moment.

Philip

Thanks for the replies David and Philip. I am still finding my way in this area so I am starting with some background reading.

The first thing I will do is go through llvm-stress and see how it broadly works. I will then go through Philip’s bulleted list and try to follow his suggestions.

Cheers,
Saurabh