MLIR diffing

Hey everyone,

When working on transformations (understanding or changing them), I commonly look at IR diffs. So far, I’ve been using wdiff for this, but it’s not a great experience, because diffs tend to be quite noisy.

Here’s my wish list:

  • Diff MLIR at the graph level, ignoring names and ordering. Should detect moved nodes and inserted or deleted intermediate nodes as well. Output can be textual. Maybe bonus points for Graphviz output - this might be useful for visualizing moves.
  • Diff diffs: does this pass that previously did something now do something else? This doesn’t necessarily need to be a core functionality of the tool, but the output should support this use case.
  • Analyze pipelines. Again, not necessarily a core functionality of the tool. Supporting this could mean providing a simple web UI.

Applications of this that I’m interested in are learning, debugging and benchmarking (e.g. only run benchmarks that are really affected or flag likely noisy results).

Are there any existing solutions for this?

Thanks,
Johannes

1 Like

Unfortunately nothing fancy here (e.g., folks using custom regex rules to avoid frivolous matches), and I agree this would be very useful for many folks. We have an open project that would have been useful here (llvm-canon kind of tools for MLIR) to reduce some of that. The tree-sitter grammar I added was as I was trying some semantic diffing tools that use such grammar, but unfortunately the results weren’t as good as hoped (which could also have been due to how the grammar was defined though …). Would make for a great contribution (and potentially multiple projects possible here).

– Jacques

Hey Jacques, thanks for the response and sorry for forgetting about this. I ended up going with naive text diffs + some normalization (slightly more than regexes, but not much), which sort of worked. But ultimately, for complex IR, I found none of this too helpful. I ended up building an interpreter instead, plus a couple of tools on top of it for figuring out which pass is responsible and for minimizing the resulting IR. Will update with the link when it’s open sourced if I remember.

How does the minimizer relate to mlir-reduce? (what does it do different).

For text diffing, I recommend Meld Diff. You can set some regex to ignore things like SSA variable names, but it’s not a graph diff, so will blow up if your graphs are out of order in text.

As a tool, it should be reasonable to create an IR reader and walk both at the same time and see if the operations are the same, with the same operands and the same semantics. The hard part is to know when things are “equal” again, but there probably are several known implementations by now, at least for text, that is more complicated.

The other issue is how to show this. Text is simple because you can do line-by-line, but graphs, you might need to reorder the output in order to show that the graphs are equal. But again, it gets complicated when things are not equal.

This sounds like an interesting GSOC project…

Yes, would be a really nice GSOC project agreed.

The tool is based on the interpreter, so it doesn’t need to lower all the way to LLVM. It’s probably similar to mlir-reduce otherwise (but there’s not that much logic in this part anyway - the whole thing is less than 400 lines, rules and driver). My current workflow is:

  1. Have a broken test. Run it and record the IR after each pass (there’s an instrumentation for that) as well as the inputs to the module
  2. Replay the IR after each pass. The tool will print changed results (say bufferization has a bug - the tool will tell you that the results changed after the bufferization pass)
  3. Feed the last correct IR and a pass pipeline to the reduce tool. The tool runs a bunch of reduction rules (similar to mlir-reduce) + canonicalization + a configurable pass pipeline and verifies the results stay different.

Currently there’s just a handful of rules (adapted to my needs):

  1. Truncate a function
  2. Replace an op with its results
  3. Replace an op with one of its parameters
  4. Replace an scf.while loop with {before;after;before}
  5. Reduce the loop bounds of a gml_st loop

Together with canonicalization, you still end up with pretty good minimization in practice.

I looked a bit into graph diffing and built something for XLA HLO graphs (simply merge the two graphs and apply some simple color coding). Works kind of OK in practice, but I found the utility somewhat meh. Also not sure how to make it work on bufferized IR.

Neither does mlir-reduce, it is effectively a rewriter with multiple rules (and ability to add additional patterns, e.g., it can use canonicalization and regular patterns) along with exploration & tracking.

What guides the exploration?

EDIT: Glancing at the code, it seems mlir-reduce just applies reductions greedily. Then my tool is very different - it only applies reductions that still trigger a miscompile.

A user provided metric extraction controls exploration, path where the condition being minimized is found is followed while rollback on any minimization where it isn’t triggered. In this case the provided script would be detecting miscompile or not.

I see how that could be useful, but the interesting part is really the script to detect the miscompile, not the driver or the simplification rules. And for the cases I’m interested in, I don’t know of a way to write such a script without interpretation or a reference compiler (which I don’t have).

I have been experimenting with the available MLIR Source Location information to create a utility to help with MLIR Diff. I think this information, paired with LocationSnapshot is useful to derive some context for the changes that occurred after a given MLIR Pass.

I created a prototype to demonstrate this here- Utility to create a list MLIR Mutations after a transform. I am interested in getting some early feedback from the community and can send a more formal proposal if there is interest.