[discussion] Fuzzing Pattern Application

matthias-springer · January 24, 2023, 11:18am

I have seen many cases of “incorrect pattern usage” in the last months in downstream projects; sometimes also in MLIR itself. E.g.:

Faulty patterns crash when applied in a different order. (Only works when patterns are applied in a certain order.)
matchAndRewrite return value is inconsistent with what the pattern is doing. (E.g., modifying IR but returning failure.)
Pattern modified IR without going through the rewriter.

As a first step, I built a fuzzer that randomizes the worklist of the GreedyPatternRewriteDriver (⚙ D142447 [mlir][WIP] GreedyPatternRewriteDriver: Worklist fuzzer). I found various cases of crash/“incorrect result computed” in multiple downstream projects. MLIR seems to be fine, but there are many “false positives”, where test cases check too aggressively for exact IR. (Most (all?) of these could be resolved with CHECK-DAG or proper variable/block name capturing.)

I wanted to get people’s opinion/thoughts on this. Is this useful? Also, how could such fuzzers/expensive checks can be controlled/enabled. Some options that come to mind:

Add a flag to mlir-opt. Downside: It is difficult to propagate this flag to the right place (e.g., GreedyPatternRewriterDriver.cpp or PatternApplicator.cpp).
Compile time flag, e.g., -DMLIR_ENABLE_FUZZER=ON.
A Gist on Github that users can patch in if they want to run the Fuzzer/other expensive checks.

nicolasvasilache · January 24, 2023, 11:49am

Nice, having a way to root out unhygienic patterns is very appealing to me (see some concerns I have in a previous comment).

However, I think adding significantly more CHECK-DAG could be problematic as we also want reproducibility in the compiler: the default order we apply things in should always return the same results.

I’d find it useful to use this without performing the regex part but only checking for failures, errors, crashes and infinite loops in a first approximation.

There is also the problem of pattern interference / orthogonality that could indeed benefit from more CHECK-DAG but that I don’t see as clear bugs. It would still be useful to have offline signal on those and maybe something low-tech as a local sed s/CHECK:/CHECK-DAG:/g would be enough to capture cases where materially different IR is generated.

Am I reading correctly from your post that you ran fuzzing and did not find crashes / timeouts ?
That would mean the upstream MLIR patterns are pretty clean already, despite the API footguns, which would be quite encouraging for the overall health of the project.

matthias-springer · January 24, 2023, 12:45pm

I was just fuzzing the worklist order. Upstream MLIR patterns are pretty resilient to that. I was not checking for invalid matchAndRewrite return values or direct IR manipulation (bypassing the rewriter) yet; I expect a few broken patterns in upstream MLIR (but not too many as we’ve already fixed a few with the recent changes to the GreedyPatternRewriter).

mehdi_amini · January 24, 2023, 7:05pm

That would seem like a good way to land this to me.

I don’t think we have a mlir-config.h generated by cmake yet, but I felt like I needed it a couple of times recently!

jpienaar · January 24, 2023, 7:27pm

A config would be good indeed (in general, not specific to this).

I’d also recommend a different enable flag name. There are soooo many different fuzzing forms and we already have a fuzzer tool upstream while this would be at a different level. With a config file we could be pretty explicit and then even have classes of these.

That would be a nice fix to make the test focus more on the structure [I had that old change upstream to randomize SSA IDs printed to also identify other such hardcoded cases and we discussed adding a verify config for stuff like that, but that was more just to help clean up core at that point that I made it, although it is a useful thing to identify incidental checks]

nicolasvasilache · January 26, 2023, 10:06am

Haven’t followed how we use config files, could you please point to an example?

Strong +1 on naming deconflation, I would personally run the fuzzer on all my local presubmits so I’d love if it were easy to enable, -DMLIR_ENABLE_GREEDY_REWRITER_WORKLIST_FUZZER=ON would work for me.

kuhar · January 26, 2023, 2:24pm

On the subject of cmake flag and variable naming/comments, I’m not sure if I would consider this fuzzing. The WIP patch does not seem to keep track of any internal state apart from the RNG, so to me it seems more like more general ‘random testing’. How about either something like -DMLIR_RANDOMIZE_GREEDY_REWRITER_WORKLIST + a new command line flag --mlir-rng-seed for reproducibility, or limit the scope a bit and plug into -DLLVM_REVERSE_ITERATION?

mehdi_amini · January 27, 2023, 1:27am

The way it work is based on the make “configure_file” feature, see here: https://github.com/llvm/llvm-project/blob/main/llvm/CMakeLists.txt#L1024-L1026

configure_file(
  ${LLVM_MAIN_INCLUDE_DIR}/llvm/Config/llvm-config.h.cmake
  ${LLVM_INCLUDE_DIR}/llvm/Config/llvm-config.h)

Where the llvm-config.h.cmake contains configuration flags like this:

/* Define if threads enabled */
#cmakedefine01 LLVM_ENABLE_THREADS

and in your build folder you get a header “configured” based on the CMake variables, for example mine right now:

$ cat build/include/llvm/Config/llvm-config.h
...
/* Target triple LLVM will generate code for by default */
/* Doesn't use `cmakedefine` because it is allowed to be empty. */
#define LLVM_DEFAULT_TARGET_TRIPLE "arm64-apple-darwin22.3.0"

/* Define if threads enabled */
#define LLVM_ENABLE_THREADS 1

/* Has gcc/MSVC atomic intrinsics */
#define LLVM_HAS_ATOMICS 1

/* Host triple LLVM will be executed on */
#define LLVM_HOST_TRIPLE "arm64-apple-darwin22.3.0"
...

Topic		Replies	Views
Fuzzing MLIR Infrastructure MLIR mlir	6	272	March 3, 2025
[RFC] MLIR Pattern Matching for library and acceleration instruction rewriting MLIR	10	1232	November 10, 2023
RFC: MLIR "pattern catalog" MLIR	16	563	August 26, 2025
Guidance for what Patterns can/should be upstreamed MLIR	18	585	June 5, 2025
Finding Rewrite patterns MLIR mlir	9	431	August 26, 2025

[discussion] Fuzzing Pattern Application

Related topics