Regarding fuzzing llvm-ir passes

SaurabhJha · July 19, 2021, 7:05pm

Hi llvm people,

I have been contributing to clang for a while. I am now looking for something to work on in llvm-core.

In the list of open projects, I found llvm IR fuzzing to be interesting. I saw the gsoc page for llvm and browsed through the mailing list and it seems to me that no one else is actively working on it at the moment.

Is anyone else working on it right now? I am planning to start on the prerequisite readings once I get a better view on what’s going on in this area or whether I should pursue something else.

Many thanks,
Saurabh

dblaikie · July 19, 2021, 7:12pm

Seems viable (+Kostya, maybe he can +anyone else on his team/he’s worked with who might be interesting in collaborating on this use of fuzzing, or provide other general pointers, etc)

preames · July 19, 2021, 8:31pm

A bit of prior work to be aware of:

There’s something running under OSSFuzz already. I’m not super clear on what this is, how it works operationally, but definitely something to be aware of.

llvm-stress is an in tree tool for generating random IR. Not sure this has been actively maintained at all though.

If you’re going to use a coverage guided fuzzer, you want to give some thought to your corpus choice. Will your corpus be IR? Bitcode? A random seed for llvm-stress? A random buffer replacing llvm-stress’ RNG? Each has tradeoffs and will exercise different parts of the infrastructure.

It’s also worth commenting that bugpoint’s reduction strategy tends to be a very effective mutation fuzzer in practice.

Personally, I’d approach it with something like the following:

Start with a corpus of random seeds to llvm-stress + a pass identifier. Should be easy to stand up and run with any fuzz driver, make sure it works and fix the obvious problems to get a reasonable fuzz rate.
Then extend your llvm-stress seed corpus into a random buffer corpus. Extract llvm-stress into a library which consumes a string of random bytes. Have the first byte of the buffer map to pass under test and the rest of an llvm-stress input.
Once that was running successfully - extend it. There’s lots of room to improve llvm-stress’ generator.
Another extension would be to add in mutation transforms after generation but before pass of interest. (Extracting out bugpoint/llvm-bisect transforms to use for the mutation would work pretty well.) Basically, you extend your input buffer to allow a set of transform identifies following the buffer passed to llvm-stress.

The preceding is not super well thought out, just what occurred to me in the moment.

Philip

SaurabhJha · July 20, 2021, 7:57am

Thanks for the replies David and Philip. I am still finding my way in this area so I am starting with some background reading.

The first thing I will do is go through llvm-stress and see how it broadly works. I will then go through Philip’s bulleted list and try to follow his suggestions.

Cheers,
Saurabh

Topic		Replies	Views
Applying for GSoC 2021(Fuzzing LLVM-IR Passes) LLVM Dev List Archives	2	111	March 10, 2021
Applying for GSoC 2021(Fuzzing LLVM-IR Passes) LLVM Dev List Archives	8	90	March 9, 2021
[GSoC] Fuzzing LLVM-IR Passes Proposal LLVM Dev List Archives	0	95	March 31, 2021
llvm-stress for fuzzing llvm LLVM Dev List Archives	8	78	March 1, 2012
[PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation LLVM Dev List Archives	1	111	May 15, 2012

Regarding fuzzing llvm-ir passes

Related Topics