Applying for GSoC 2021(Fuzzing LLVM-IR Passes)

Hi Johannes,

Glad to hear from you! I understand that the title listed in the llvm GSoC 2021 webpage serves as a general guideline but a project proposal might need limit its scope and focus on the deliverables. The ideas proposed all seems quite appealing and relevant to me. I’ve been browsing through llvm.rog/docs/FuzzingLLVM.html and llvm-project//tools/-fuzzer recently as well as the youtube video that you mentioned on the GSoC site. The following are some questions I’ve accumulated. (forgive me if they are too naïve…).

  1. Truth to be told, I’ve used OpenMP before for my course project, but I haven’t look into the inner workings of it, e.g. how it actually instruments programs decorated with #pragma, and how it interact with the OS’s threading. If llvm’s OpenMP implementation hasn’t been fuzzed before, then it surely is a valuable fuzz target. Could you give some clue on how we could fuzz OpenMP? Like writing a parser for fuzzer input and calling openmp library function in LLVMFuzzOneInput function? Or we fuzz it through clang? I’ll look into llvm-project/openmp some more.

  2. For the custom mutator idea. My understanding is that currently there are 2 kinds of mutators, the generic one that is shipped with LibFuzzer (Bit flipping, splicing, etc.), and a structural mutator. Is the structural mutator related to IRMutator.cpp in the FuzzMutate folder?

  3. Most of the bugs found by fuzzers are usually crashes or hangs. Correctness testing is interesting but hard to achieve from my limited knowledge. I wonder if this is related to the ‘Alive’ tool mentioned by Florian? The fuzzer provides input to some llvm pass, and ‘Alive’ will verify that the transformation is valid. Please correct me if my understanding is wrong…

To be honest, previous llvm passes I wrote are out tree passes. I’ve just setuped my machine, built llvm configured with fuzzer support, and started fiddling around lately. I have a rough picture of what each idea is about, but it would take some preparation work for me to split them into incremental steps and deliverables. Since it’s still early in the application process, I wonder if you can spare me some time researching the ideas that you proposed and making inquiries before finally deciding on my project proposal? :blush:

I am living in Shanghai, in the GMT+8 time zone. How about 15:00 tommorrow (March. 10), or 13:30 on Friday afternoon (March. 12)? I am not sure which time zone you are located in, so feel free to propose another time slot if the prior two are not convenient for you (later that day or on weekends are both fine). Hope to have a chat with you soon.

Cheers,

Chibin Zhang

2021.3.9

3.Most of the bugs found by fuzzers are usually crashes or hangs. Correctness testing is interesting but hard to achieve from my limited knowledge. I wonder if this is related to the ‘Alive’tool mentioned by Florian? The fuzzer provides input to some llvm pass, and ‘Alive’will verify that the transformation is valid. Please correct me if my understanding is wrong…

Yes, exactly. Currently what we do is run the LLVM test suite with Alive2 watching every transformation and looking for problems -- this has found a number of issues.

A similar process, but with inputs supplied by a random IR generator, should work quite well.

John

Hi Chibin,

Hi Johannes,
        Glad to hear from you! I understand that the title listed in the llvm GSoC 2021 webpage serves as a general guideline but a project proposal might need limit its scope and focus on the deliverables.

Yes, students will write the actual proposal which should contain more details and scope discussion.

  The ideas proposed all seems quite appealing and relevant to me. I’ve been browsing through llvm.rog/docs/FuzzingLLVM.html and llvm-project/*/tools/*-fuzzer recently as well as the youtube video that you mentioned on the GSoC site. The following are some questions I’ve accumulated. (forgive me if they are too naïve…).

Questions are always good.

1. Truth to be told, I’ve used OpenMP before for my course project, but I haven’t look into the inner workings of it, e.g. how it actually instruments programs decorated with #pragma, and how it interact with the OS’s threading. If llvm’s OpenMP implementation hasn’t been fuzzed before, then it surely is a valuable fuzz target. Could you give some clue on how we could fuzz OpenMP? Like writing a parser for fuzzer input and calling openmp library function in LLVMFuzzOneInput function? Or we fuzz it through clang? I’ll look into llvm-project/openmp some more.

So the OpenMP runtime has an "internal" and an external part. The internal part is full of undocumented dependences so I doubt we can fuzz it without breaking at least one for each test. The external one is fuzzable however. That said, generating OpenMP programs to be feed to clang seems like a good thing to do. OpenMP has it's own set of "documented" dependences, e.g., nesting restrictions, but that is not necessarily a problem.
If we generate an invalid OpenMP program we should gracefully fail, in most cases. If we don't we have good test cases for an OpenMP sanitizer later on. We could also embed knowledge about nesting and other OpenMP restrictions into the fuzzer/mutation tester/test generator. Long story short, generating a large corpus of OpenMP inputs is certainly something I'm interested in, we can start with "random" programs and evolve towards more targeted approaches.

2. For the custom mutator idea. My understanding is that currently there are 2 kinds of mutators, the generic one that is shipped with LibFuzzer (Bit flipping, splicing, etc.), and a structural mutator. Is the structural mutator related to IRMutator.cpp in the FuzzMutate folder?

I'm not sure myself. I think "structural" here means it fuzzes a well defined structures, here protobuf. I might be wrong.

What I was looking for, among other things, is a way to do CFG transformations and less obvious IR transformations, maybe:
- Add a "while-loop" with one iteration around a (set of) block(s) (various ways to "hide" the one iteration part)
- Add a "do-loop" with zero iterations around a (set of) block(s) (various ways to "hide" the zero iterations part)
- Add a call to an function SCC which does effectively nothing but writes new buffers passed to it or allocated within.
- Add branches that will not be taken with various targets, unreachable, some arbitrary block in the function, etc.
- Add arguments to functions that are effectively useless.
- ...

We would do those and record if and how the change impacted passes or the entire O3 pipeline. Learn about our heuristics and cutoffs and such, build a database, etc.

3. Most of the bugs found by fuzzers are usually crashes or hangs. Correctness testing is interesting but hard to achieve from my limited knowledge. I wonder if this is related to the ‘Alive’ tool mentioned by Florian? The fuzzer provides input to some llvm pass, and ‘Alive’ will verify that the transformation is valid. Please correct me if my understanding is wrong…

Yes, that is the idea. If we fuzz blindly, as opposed to guided test mutation or synthesis, we will generate a lot of garbage inputs which can only be used to detect crashes and hangs. However, given Alive we can verify if the output of the compiler is an implementation of the input, for some cases.

To be honest, previous llvm passes I wrote are out tree passes. I’ve just setuped my machine, built llvm configured with fuzzer support, and started fiddling around lately. I have a rough picture of what each idea is about, but it would take some preparation work for me to split them into incremental steps and deliverables. Since it’s still early in the application process, I wonder if you can spare me some time researching the ideas that you proposed and making inquiries before finally deciding on my project proposal? :blush:

As mentioned, students write the proposal. You should determine which of the "areas" you like best and then do some research towards that. We can be in contact and you start write up what you want to do.

I am living in Shanghai, in the GMT+8 time zone. How about 15:00 tommorrow (March. 10), or 13:30 on Friday afternoon (March. 12)? I am not sure which time zone you are located in, so feel free to propose another time slot if the prior two are not convenient for you (later that day or on weekends are both fine). Hope to have a chat with you soon.

This week is full, I'll get back to you.

~ Johannes