Fuzzing MLIR Infrastructure

I am currently conducting fuzz testing on the MLIR infrastructure. I have detected some crashes on the latest version and submitted issues, but I rarely receive confirmation or responses. Perhaps someone might be interested in these findings?
Here are some of the crashes I have already submitted.

1 Like

Fuzzing is really nice, however the problem we have is that a lot of the test passes in MLIR aren’t designed to be resilient against fuzzing. Another issue is that a lot of the fuzzer test-cases are crashes on inputs that so artificial that no one consider these really high-priority to fix (I think that’s not unusual for fuzzer initiatives to hit this issue?).

Also, it is possible to tag all the issue reported by a fuzzer so that it’s identified as a fuzzing bug?

1 Like

Thank you very much for your response and suggestions. I understand your point. I will mark the issues tested by the fuzzer and try to avoid crash reports caused by the input.

Thank you for trying to make our codebase more robust. We’ve had a lot of success with fuzzer generated bugs files against the SPIR-V dialect and related conversions. These were often, in hindsight, relatively simple oversights in our test coverage that the fuzzer helped close.

Like @mehdi_amini outlined, fuzzer-generated bugs have different degrees of usefulness. You can expect much more engagement from the maintainers if you file bugs against load-bearing conversation passes that are supposed to be resilient to any valid input IR, and less if you target test passes that have some loosely defined set of supported inputs.

I think that having an official fizzing guidelines like those outlined here: [RFC] Guidelines for fuzzer-generated issues by @nikic would help. We could further define mlir-specific considerations.

2 Likes

To be clear: this is not intentional, I consider this a bug actually: passes should be conservative and resilient against any input IR!
However I’m describing the state of things, however sad it is. This makes it hard to distinguish signal from noise in the fuzzer-filled issues. @kuhar provided some interesting advice though!

One more thing: often the crash reports are filled with issues mentioning an assertion in a low-level piece of the infra, for example: [Mlir] --canonicalize crashes in Casting.h:656 ; while actually the relevant information for this crash originated from frame 15 SimplifyAffineOp<mlir::affine::AffineApplyOp> and frame 14 mlir::simplifyAffineMap.

So a better description could be: AffineApplyOp canonicalize pattern triggers crash in mlir::simplifyAffineMap().
Having good description would help triage and get the attention of the right people for an issue.
(I know it’s not easy for someone not familiar with the system, just pointing out why these bugs are hard to triage/prioritize)

1 Like

Thank you for your response. I now understand the importance of accurately categorizing crashes. Although it is challenging, I will do my best to clarify and describe the root cause in the title.