Applying for GSoC 2021(Fuzzing LLVM-IR Passes)

Hi LLVM developers,

I am a junior student majoring in Computer Science at ShanghaiTech University. I’ve been browsing through The LLVM Compiler Infrastructure Project and found out that LLVM is participating in Google Summer of Code 2021. I wanted to be sure to inquire about it and get in touch soon.

One project idea really caught my eyes, ‘Fuzzing LLVM-IR Passes,’ striking chord with my experience. Currently, I am working with Professor Hao Chen (Contact author of the fuzzer Angora) and his fellow graduate students researching fuzzing. My main contribution lies in the experimental evaluation part. I’ve written many scripts to automate the benchmarking of different fuzzers, e.g., building libraries with fuzzer instrumentation (afl-clang-fast…), running fuzzers, and triaging and analyzing fuzz-results with afl-cov. Our most recent paper is under review at USENIX Security 2021. I’ve also written a dozen LLVM passes following USCD’s advanced compiler course (Open-sourced code and notes at GitHub - chibinz/CSE231: Notes and projects following UC San Diego's Advanced Compilers course) as practice and contributed to the basic block stubbing pass for coverage feedback in the prior research project. If applicable, I see this as a golden opportunity to exercise what I’ve learned about LLVM and fuzzing in real-world application and at LLVM scale. This also a chance for me, as an LLVM user, to contribute back to LLVM, following the FOSS spirit.

I wonder if this project is already occupied or still available? Are there any ‘good first issue’ that I can start working on or code of interest worth reading? I am aware the project description directs me to reach out to mentor Johannes Doerfert on IRC, but somehow trying to connect to the LLVM IRC channel using different clients always complains that the server refused the connection…(Sorry, but I’m not familiar with IRC, this is the first time trying…) Are there any other way to get in touch with the mentor? I feel really excited and hope to hear from you soon.

Sincerely,

Chibin Zhang

2021.3.1

+Johannes Doerfert

Hi Chibin,

Johannes will give you more information, but you can always start by familiarizing yourself with the Attributor.

Stefan

Hi Chibin,

apologies for the late reply.

There have been multiple people that expressed interest in this project,
though it is open ended and we might want to tackle it from different angles.

Truth be told, anything towards testing LLVM would be OK with me. To name a
few areas: mutation testing inputs (C/C++/LLVM-IR/...), reordering or skipping
passes, creating an IR database for testing but maybe also other purposes.

I'd also be interested in fuzzing the OpenMP frontend and runtime (both on the
host and GPU) if that is something you might want to do. I think here is a
plethora of crashes and hangs to be found, not to mention the correctness issues
if we manage to do test generation for which we can verify the result.

Given that you have experience building and extending LLVM already, I think a
good next would be to narrow it down so you can start looking at the infrastructure
we want to test, e.g., the pass manager builder if we want to swap passes to find
hidden dependences between them. The existing fuzzer capabilities and the C++
mutation testing developed outside of LLVM (https://github.com/mull-project/mull)
are also good places to take a look.

Let me know what of the above areas, or areas of your choosing, might be most
interesting to you. We can also schedule a 30min chat, just send me an email with
times that would work for you.

~ Johannes

Hi folks, an angle related to IR fuzzing that I would be happy to help out with is using Alive2 as a test oracle.

Using Alive2 incurs a set of problems (not all IR features supported, can be very slow) but has corresponding advantages (considers all inputs at once, handles UB gracefully).

John

https://llvm.org/docs/FuzzingLLVM.html looks like good background (if you haven’t already seen it).

If anyone’s interested in combing LLVM’s libFuzzer & Alive2, I’ve put up ⚙ D96654 [NOT-TO-BE-MERGED] Add alive2-based fuzzer which uses Alive2 to verify candidates generated by fuzzing. It works out quite well, but I think there’s lots of potential to improve the ‘interestingness’ of the IR generated by libFuzzer.

Cheers,
Florian

Having Alive2 as oracle would certainly be great.

Some rough ideas that can be worked on in parallel if we have multiple GSoC students:
- mutation rules we know are sound, e.g., remove guarantees, add 1 iteration loops, etc.
- input generation, equivalence checking (alive, partial evaluation, ...)
- fragment extraction from larger codes + input tracking -> reproducer splitting, faster equivalence checking, ...

We certainly can come up with more things.

Would either or both of your (or anyone else) be interested in co-mentoring students?
We have multiple interested ones already, even though my project description is lacking any detail.

~ Johannes

I would be happy to co-mentor a student doing this sort of work.

John

I’d be happy to help out.

Cheers,
Florian