MLIR Dialect to represent Trees

Hi

As a learning experience, I’m trying to represent lisp-esque expressions as a MLIR dialect.

For example, the type of expression I’m working with could be:

( + 1 counter )

or

(map 
  (lambda (x) (+ x 1)) 
  (1 2 3 4 5)
)

If I were creating a custom IR for this type of expression, a tree would be the logical choice before lowering to SSA form. But since the point of MLIR is that we don’t need a custom IR, I’m wondering whether I can represent trees in a “nice to work with” manner in MLIR.

My approach so far has been to create three operations in my dialect:

  • A list, which just has a region (graph region, not CFG) with a single block, where I add all the children of the list, and returns nothing
  • Symbols, to represent functions/variables/etc, which return nothing
  • Constants, to represent number literals

In addition, I also had to add a terminator which I call “end”, that is a total no-op.

This leads to the following representation in MLIR (for ( + 1 counter )):

module {
  "lisp.list"() ( {
    "lisp.symbol"() {name = "+"} : () -> ()
    "lisp.constant"() {value = 1 : i32} : () -> ()
    "lisp.symbol"() {name =  "counter"} : () -> ()
    "lisp.end"() : () -> ()
  }) : () -> ()
}

However, I’m not sure that this is a good approach for the following reasons:

  1. For one, it is still necessary to add a terminator to every block, which leads to an awkward “end” symbol being necessary.

  2. The second thing I don’t like about this is that I’m missing out on the SSA representing the flow of data - all the operations are “void” operations.

  3. The third concern I have is performance of optimisation passes over this. Is it a problem if I have very deeply nested regions? Was MLIR built expecting this, or is nesting them like this an abuse of regions?

What are your opinions on this? Is this a good way to represent trees in MLIR? Is there a generally accepted way to solve this problem?

I’m still very new to MLIR, and any help or advice would be greatly appreciated!

  1. In simple cases like this you can use Traits - MLIR and elide the terminator.
  2. This is exactly the right thinking… how do we represent what you want to represent? The great thing about MLIR is that at different points in a tool you can represent the same behavior in different ways, so you don’t have to focus early on in selecting the ‘right’ representation, because this will always have some tradeoffs. What you’ve shown is highly syntactic, making it easy to generate from the parse of your lisp expressions. How would this representation get transformed into a different representation that represents the flow of data? What would the result of that transformation look like?
  3. Regions are fundamental in MLIR for representing hierarchy. Traversing hierarchy is similar in cost to traversing from one operation to another within the same region.

Thank you, your reply is extremely helpful.

  1. I didn’t know about that, and seems perfect for this use case

  2. My concern is that working with my dialect I might have to “hack” some things because the framework isn’t really expecting an IR structured in this way. For instance, I tried writing a canonicalisation pass on + symbols that does constant folding (ie (+ 1 2) becomes (3)). I struggled to find a straightforward way to access the constants 1 and 2 using the pattern rewriter, because MLIR expects any inputs to + to be “wired” in as actual inputs to the operation - where I’ve just implicitly got them next to one another. Perhaps this means, as you say, that I don’t have a good representation for what I’m trying to do, and should lower to something else before constant folding.

  3. That’s good to know!

In any case, thank you for your reply and I will continue playing around with this!

About SingleBlockImplicitTerminator - Unfortunately I only seem to be able to add a single Operation type as an implicit terminator. Eg defining:

def ListOp: LispDialect_Op<"list", [DeclareOpInterfaceMethods<RegionKindInterface>, SingleBlockImplicitTerminator<"SymbolOp">, SingleBlockImplicitTerminator<"ConstantOp">]>

the verifier will always either complain like this:

error: 'lisp.list' op expects regions to end with 'lisp.constant', found 'lisp.symbol'

I’d like to have every operation implicitly terminate a block.

Do you know any workaround to this?

You can mark each of the op as being a terminator.
(but they would have to terminate blocks)

The easiest thing is to just have a dummy ‘end’ operation. This is the ‘implicit’ terminator with no successors.

Yes, unfortunately that won’t work for me here (I need operations to optionally terminate). I’ll just continue using the dummy terminator, like @stephenneuendorffer says.