[RFC] Add arithmetic, logical and comparison expressions into PDLL

mgehre-amd · April 9, 2024, 8:45pm

Introduction

We propose to introduce arithmetic, logical and comparison expression into PDLL
to enable writing patterns like

Pattern {
  let root = op<my_dialect.foo> {shift = val: Attr} (inputOp: Value);
  let doubled = val * attr<"2 : i32">;
  doubled >= attr<"10 : i32">;

  rewrite root with {
    let newOp = op<my_dialect.bar> {upper_bound = -doubled} (inputOp) -> ();
    replace root with newOp;
  }
}

Motivation

We use PDLL a lot for pattern matching.
One of the use cases is matching parts of the IR into hand-optimized
kernels and configuring them appropriately.
We are generally happy with its precise language and we have made heavy use of
defining our own native constraints/rewrites to extend the capabilities of the language, e.g. when checking and transforming attributes.
Since a few weeks, native constraints can return results, which allows to break up monolithic native constraints (like isSumGreaterThan) into more atomic expressions (like add and greater_than).

We would like to move those into the core PDL language to make writing PDLL
patterns easier, improve readability and make them less dependent on a large
set of custom native constraints/rewrites.

Proposal

We propose to add arithmetic, logical and comparison operators with their usual representation into PDLL. As an example, let’s discuss the addition operator +.

The expression a + b requires a and b to be of type Attr.
It produces a result of type of type FailureOr<Attr>.

When the expression is evaluated at runtime, we perform the following legality checks

both a and b need to be IntegerAttr or both a and b need to be FloatAttr
their types (as given by Attribute.getType()) are the same.

Then we perform the addition, failing if an overflow occurs.

If any of the legality checks or the addition itself fails,

and the expression is in a constraint section, the constraint fails to match.
and the expression is in a rewrite section, the rewrite is aborted and rolled-back (if possible).

We would like to disallow arithmetic operations as toplevel expression, i.e. make

Pattern {
  ...
  val + attr<"2 : i32">; // arithmetic as toplevel expression - disallowed

syntactically illegal, and only allow comparison and logical operators there:

Pattern {
  ...
  let a = val + attr<"2 : i32">; // `let` as toplevel expression - allowed
  val < attr<"2 : i32">; // comparison operator - allowed
  boolval && attr<"1 : i1">; // logical operator - allowed

When comparison and logical operators (those result in FailureOr<IntegerAttr> with type i1) appear as toplevel expressions in a constraint section, the constraint only matches if both the operation succeeded and the returned attribute is 1 : i1 (i.e. true).

Design alternatives / options

Type promotion

Instead of requiring the types of the left-hand-side and right-hand-side in binary operators to match, we could instead go for type promotions. At this stage we don’t see much benefit from it, but developers would need to learn the promotion rules to understand the constraint matching correctly.
This could be lifted at a later time if necessary.

Allow arithmetic operations on toplevel

We could make

Pattern {
  ...
  val + attr<"2 : i32">;

legal by discarding the result of the addition and make the constraint match if val is an IntegerAttr of type i32 and the addition doesn’t overflow.
To avoid the ambiguity with the interpretation in the next section, we prefer making this illegal and requiring let unused = val + attr<"2 : i32">; instead.

Convert Attr values on toplevel into match failures

We could give

Pattern {
  ...
  val + attr<"2 : i32">;

the meaning of “match whenever val + 2 is not zero”.
More formally, when the toplevel expression’s result type in a constraint section is FailureOr<Attr>, we could consider the constraint to match when the result is not a failure and the attribute value, converted to boolean, is true.
To avoid the ambiguity with the interpretation in the previous section, we prefer making this illegal and require an explicit comparison, i.e. val + attr<"2 : i32"> != attr<"0 : i32">; here.

Implementation

On the PDLL parser, we have a pretty straightforward implementation of this proposal.

For the representation of the operators in PDL, we see the following options, where we have prototyped A (see here and here).

A) Builtin native calls

We can make use of the native constraints/rewrites pdl ops, and turn operations like + into
calling builtin native calls pdl.apply_native_constraint "__builtin_addConstraint"(%a, %b).
This allows to keep the PDL dialect and their consumers (Bytecode interpreter) unchanged.
MLIR can ship with those builtin native calls, which are automatically declared in the PDLL parser, and have their definitions automatically available in the PDL bytecode interpreter.

B) Define new PDL ops

Introduce new ops into the PDL dialect, either
one per operations (like pdl.add %a, %b) or one for all expressions (like pdl.attr_expr add, %a, %b). Introduce the same ops into the PDLInterp dialect.
The PDLL parser would emit the corresponding PDL op for each expression,
the PDLToPDLInterp pass lowers those into pdl_interp. The bytecode
generator turns those into new bytecode ops, and the bytecode interpreter learns
to interpret those.

(B) requires more changes and we don’t
see much gain from it, assuming that the compiler will likely handle those as
constraints/rewrites in an opaque way.
Thus we think that the tradeoffs are in favor of (A).

Future work

Other PDLL extensions we have prototyped/looked into are syntax for creating
and manipulating ArrayAttr and DictionaryAttr. Those follow a similar
implementation, but are not part of this RFC.

Request for comments

What do you think? We love to hear your comments.

I’m currently at EuroLLVM, love to talk to you there, too!

For visibility: @Mogball @martin.luecke

j2kun · April 10, 2024, 2:26pm

I don’t have any skin in the game as to the RFC, but I would like to get more familiar with PDL and PDLL. Could you point me to any examples of your use of PDLL so I could study them? I’d also be interested in eventually getting involved in the development of PDLL so it can finally support region-holding ops.

martin.luecke · April 16, 2024, 8:56am

Thank you for this RFC and for pushing on these ideas.

To me it makes sense to follow option A and have this as part of the PDLL frontend. The PDL dialect models only the logic of a rewrite so operations such as pdl.add modeling arithmetic do not really fit.

Another option C could be to use existing dialects to model the computation and interpret them as part of the pdl pattern. So for instance mixing in arith dialect into a PDL pattern to model the multiplication in your example.
This reminds me of the interpretation approach that was presented as part of Mojo at an LLVM dev meeting and also the approach of xdsl for interpretation of various dialects from Python. This would open up many questions in terms of how to get the typing aligned, which dialects would be supported, what are the precise semantics of the ops etc. This would be a general solution, but depending on the supported ops in patterns maybe a bit powerful. It would also require quite an amount of additional infrastructure.

I think option A is more sensible and in scope for this.

What type of operations do you imagine in the PDLL standard library, just standard arithmetic and comparison functions?

While we are chatting about this, I think it would also be great if we could identify a path towards making TableGen definitions of types and constraints usable from PDLL/PDL, e.g.

let root = op<my_dialect.foo> (inputOp: TosaInt8);

As far as I know they can already be imported and emitted from PDLL, but support for linking to the mangled definitions TableGen produces is challenging.

Moxinilian · April 16, 2024, 9:24am

First let me state that I support all the PDLL-related part of the proposal. I think none of the design alternatives are better than the core proposal.

Now about the PDL part. One of the things I do not like about native calls is that the linking is usually a bit brittle. In contrast to a fixed set of operations, it’s difficult for a backend consumer to be aware that they may be missing implementation for some logic.

I use PDL as a consumer for custom backends (that does not go through the PDL-Interp pipeline). While an intrinsic-like approach could be taken to make native calls more portable, it’s still difficult to discover and handle properly in my opinion.

This is my favorite option so far. It makes it somewhat limited in scope to add to PDL itself (one would only need a few operations to extract attributes for example), and we can then reuse the lowering logic of other dialects when needed. I’m not sure how this would fit in the PDL-Interp bytecode story though. Arguably though I feel like PDL-Interp could itself be lowered to LLVM and ran via MLIR ExecutionEngine to allow the reuse of other dialects that can lower to LLVM, which if I remember properly should not be too hard to do.

Moxinilian · April 16, 2024, 9:28am

I unfortunately have yet to find a public user of PDLL or PDL. For a long time before meeting Matthias at EuroLLVM I thought no-one was using it at all.

jpienaar · April 16, 2024, 11:44am

I mean it’s been used in the TensorFlow repo publicly for years (those ported to StableHLO maybe more than a year ago too), ByteDance folks I saw was using it in their compiler repo and IREE has been actively using for a couple of months too. But indeed not as much public as would have been good.

I think that’s not much different from dialects registered. You need to know where you are “deploying” to. Now these become like a STL’esque base set that is more commonly there and the mechanism is just to avoid changing the bytecode/PDL interpreter. I think that’s sensible as changes to PDLL and PDL can then be done independently, this can be fully evaluated and there is little cost to making it more builtin later. But at that point it can be evaluated with more data and without affecting any existing users (it’s a pure implementation change).

I prefer option A, especially as it doesn’t inhibit option B and could always be considered later without breaking any existing. Well it would be option A’ as I think the registration of these arithmetic & logical patterns should still be optional but populated with some simple helper/linked in via target. If someone wants to avoid them, then it should be possible (PDLL is not only way to produce PDL).

When I look at this, it’s unfortunate we didn’t choose {...} for construction as the less than now reads less well (and without spaces or with extra spaces in different places would make one do double take).

With this we are making the integer and floating type special to PDLL. They are probably common enough here that that is warranted.

mgehre-amd · April 17, 2024, 12:15pm

Great to see that you are interested in PDLL. Unfortunately none of our PDLL code is public at this time.
If you like to chat about region-holding ops, you can also find me on discord as mgehre-amd.

math-fehr · April 17, 2024, 1:21pm

Thanks a lot for the RFC!

We had a round table at EuroLLVM, and what came up is that there are two distinct problems. How to represent this in PDLL, and how to make this fit “nicely” in PDL. While we can try to find a better option than the pdl.apply_native_constraint to represent this, we can always do this later, and fallback on the native calls now.

Otherwise, as a second step, I would be really interested to see if we could have option (B), and add operations for new matchers/rewriters similar to the transform dialect, using an interface for instance. I am also wondering how much reuse there could be between the transform dialect and the pdl dialect here.

stellaraccident · April 17, 2024, 1:49pm

Drive by comment: PDL is currently housed in MLIR core, which gives it a pretty high bar that must be cleared in order to have a lot of the fancy integrations one might expect of such things. This stems from it being at the root of the dependency dag vs a leaf. PDL seems to be finding it’s own in some new use cases, which is great – maybe it was just ahead of its time. However, I would expect that if it is to grow very far beyond what it currently is, it would need to be decoupled from MLIR core, becoming a leaf dependency that has some more freedom to evolve. Otherwise, the design space of what it can do without getting into weird dependency situations is pretty constrained. I’ve not looked closely at what this would take, but just noting the friction/opportunity.

mehdi_amini · April 17, 2024, 4:37pm

The main motivation behind the development of PDL is to be able to optimize the combination of matching across multiple patterns. Isn’t the concept of having an optimizing compiler for PDL at odd with the “opaque” way you’re seeing the addition of arithmetic operations and such?

jpienaar · April 17, 2024, 4:41pm

PDL has IMHO many components, having the runtime & bytecode be simple/basic while having optimizations above it isn’t at odds. One then just adds ops (or reuse arith or what not) where it is cheap to add, while keeping interpreter side bare. That’s at least how I’m splitting this in my head.

mgehre-amd · April 18, 2024, 6:00am

Yes, I also see the potential tension.

There are optimizations that could be done around attribute arithmetic, e.g. exploiting the relation ship between a < attr<10: i32> and a < attr<20: i32>. On the other hand, a bunch of other “obvious” relationships cannot be easily exploited because a > attr<0: i32> and a <= attr<0: i32> might be both false when a has the wrong type.

I talked a bit about it with Alex, and we were wondering at which scale (patterns x ops) even the existing lowering from PDL to PDL_Interp start to pay off. Do you know some data on this?

On the other side of the equation, it took us a really long time to land native constraints can return results because the convert-pdl-to-pdl-interp lowering is non-trivial - so it comes at least with a evolution & maintenance cost.

I see two ways for this in the future:
a) the convert-pdl-to-pdl-interp pass could learn the semantics of some relevant builtins to be able to optimize them
b) (preferred) once we figured out that the performance win vs implementation cost is worthwhile, we promote the relevant builtins to their own PDL ops.

Mogball · May 3, 2024, 6:50pm

In terms of design iteration vs. bar to entry into MLIR core, maybe we can separate PDLL from PDL itself. PDL is an infrastructure for optimizing pattern matching. PDLL is already a layer on top of it, providing C++ constraint support. I don’t see any reason why we can’t build arithmetic and logical operators into PDLL that emit native constraints. We can progressively graduate these PDLL language constructs to be more “first class” and optimizable in the PDL level as they mature.

A lot of this boils down to syntax sugar at the PDLL level. I think that it’s within the design space of the PDLL frontend to have things like that and decouple it from the PDL “backend”.

Topic		Replies	Views
[MLIR][PDL] Extending PDL/PDLInterp/ByteCode to Enable Commutative Matching MLIR pdl	20	1657	January 7, 2024
[PDLL] Defining string array attribute statically with AttrExpr MLIR pdll	4	282	January 17, 2023
[RFC] PDLL: a new declarative rewrite frontend for MLIR MLIR	4	1067	June 21, 2022
[PDL] Equivalent to Tablegen's PatFrag \| Matching multiple possible attribute values MLIR pdl	1	362	August 31, 2023
PDLL for unregistered ops MLIR pdll	2	265	May 24, 2022