Rewrite Patterns Expressed in MLIR - GSoC 2020

Hello!

I am an undergraduate in my third year studying Computer Science/Mathematics in Canada. I am very interested in the rewrite patterns open project and have gone through the “Toy” tutorial chapters to learn more about MLIR and the topic of rewrite patterns specifically. I was hoping for clarification on what needs work and what additional resources would be useful to read up on?

Hey,

Sure, what we are currently using Tablengen source and target DAGs:

def : Pat<(X_Op Tensor:$x, $y), (Y_Op $y, (Y_Const 20), $x)>;

(just written, not tested, possibly doesn’t parse :wink: )

But we’ve always wanted to go beyond that. Especially with the work that was done for dynamic patterns last year, it seems like expressing these rewrites in MLIR itself could be more concise, easier to read (folks may disagree ;-)), dynamically added etc. For example, we could have something like (no attempt to make pretty or consider caveats)

rewrite.pattern({
  ^bb0(%za : tensor<?xi32>, %zb : tensor<?xi32>):
    %x = x.add %za, %%zb : tensor<?xi32>
    return %x : tensor<?xi32>
 }, {
   %a = y.blah %za : tensor<?xi32>
   %b = y.foo %zb, %a : tensor<?xi32>
   return %b : tensor<?xi32> 
 })

instead (where we have equality between source and target pattern under constraints). And so the rewrite would be expressed as source/target MLIR “snippets”. There is a lot of space in this design scope though. But think about DRR as a frontend that generates C++ today, maybe the dynamic rewrite patterns in future - but the rewrite dialect is too low level for humans to write in general, but we do want to have rewrites specified at a higher level. One of the options is to express them in MLIR directly.

There are many open questions (e.g., syntax obviously, type/shape polymorphism, constraints, benefit computations, etc.). I think looking at the current DRR, LLVM’s GlobalIsel and other pattern description languages would be useful to look at.

Best,

Jacques

Have we also considered having the patterns use tabelgen as the source of truth, but also have an MLIR dialect that can be used to transform and optimize these patterns?

For instance, if there a rule that transforms A to B and a rule that transforms B to C then we should optimize the first rule to automatically transform A to C to avoid creating an intermediate Operation.

Yes, well it depends on the level, for example the work Jeff did already allows optimizing patterns but it doesn’t combine it as you mentioned. I’d consider the dialect above to be at a higher level and more amenable to such transformations, but TBD. So yes once we have them in dialects we can do more optimizations on them. And as mentioned DRR/tablegen would then be one of the frontends (well I mentioned it as a frontend of the lower level pattern rewrite description, and wasnt explicit about it being for both).

Regarding syntax, I think a special attention should be taken on the constraints / traits side. Type-specific constraints can be represented in the operation itself (addf instead of addi, for example), but other properties of the patterns need to be accurate to guarantee transformation semantics.

If we have constraints, they should be cheap to check, so that any (heuristics) engine that uses them can avoid trying to apply the rule on irrelevant sequences. Once we have a list of rules we can apply, the rewrite engine can do its work, and a final check to make sure the shape and type of inputs and results are identical.

About combining rules, I think they should be different rules entirely. There is no guarantee that, for a particular case in a particular hardware, the application of all possible rules will be the most profitable scenario.

For example, if I have the following rules:

  • A->B,
  • B->C
  • B->D
  • C->D

A is bad everywhere, C is good on CPU, D is good on GPU and B is good in FPGA. Once you combine ABD and ABCD, there’s no path that leads to B, and FPGAs will never get a good codegen.

Adding the combined rules into the pool can speed up searches (short-cuts) if we know that those states are generally better for the target hardware, but they can slow down (increase the number of searches) if not.

In summary, having static rewrite rules for static (always-good) code-generation is good enough. But if we want to have an actual rewrite dialect, it has to consider dynamic codegen (based on cost models, profile info, etc) for search engines like LIFT/RISE (https://rise-lang.org/).

Agreed, I was thinking A->B, B->C and A->C would be separate rules (so one can combine, but need not replace). Well of course if A->B and B->C would always trigger for the same target, then maybe for that target, but then you’d need to uniformly know that it is always better for all variants (e.g., if size of inputs to B is smaller than [X,Y] then don’t replace with C).

I agree the rewrites have to consider these (also I’m a fan of LIFT/RISE and have been talking with them for a couple of years, so really excited to see what they come up with). But the rewrite engine could also be what considers these, independently of the pattern specification. E.g., A->B can be expressed as a rewrite and then whether to apply it or not is left up to the driver. That’s why I always like to think of these patterns as expressing equality under constraints rather than a rewrite as rewrite implies action to me (e.g., A must be transformed to B, rather than A is equivalent to B and so if A is not desirable/legal then replace with B). But rewrite is shorter :slight_smile: (pattern is probably as short). But these should be target aware I agree (and that includes cost model side).

Syntax is very indeed very important, there are other parts such as location of ops, how variadic operands are handled, optional attributes, uses of an op outside the pattern, etc. The “example” I gave had fully constrained element types. We’d need a few iterations of that.

Exactly, but even for smaller output, it’s not always a benefit. Suppose we have two sub-trees (a, b) that are transformed to a’ and b’ by independent rewrites, increasing their outputs. On their own, that’d be a bad move, but only in (a’, b’) form, the parent tree represents a pattern that converts all that to a single instruction, with no intermediaries and small output.

Sure, most compiler passes rely on pass ordering and static heuristics to assume benefit, but those are well suited to the static rewrites.

My point is that a rewrite dialect, while leaving the possibility of static engines to use them, should focus on dynamic engines, where no rule or state is assumed better by static construction.

Combining rewrites dynamically would be an interesting research topic, if we can use the cost model to predict, at startup time, for the specific hardware/configuration/dialect, which are clearly better and thus reduce the search space.

Interesting. This essentially creates two “rewrites”, then every pattern is a pair of equivalence and whatever is the one that matches, you take the cost of both (for the specific hardware/config) and replace if beneficial.

If we add constraints for the rewrite pattern, we need to make sure he have all nodes covered, on both sides. It shouldn’t be too hard to do that, though.

From this discussion it is probably clear that there are many different aspects and there could be a few different GSOC proposals even :slight_smile: Folks interested should send proposals and we can always coordinate between the different groups.

1 Like