[RFC] Working with CFG loops

Right now MLIR has great support for structured control-flow in the forms of loop.for and the affine dialect. There has been some talk about adding support for loop-carried dependencies.

I am working on an MLIR backend to a frontend that doesn’t have structured control-flow at the point in the pipeline where it makes sense for me to transition to MLIR. This sparked my interest in looking at how one would do LICM and other loop-based optimizations in MLIR, for CFG loops.

My main motivation is that I have more knowledge and finer/better semantics that I can represent in MLIR, but they get invalidated/lost in the translation to LLVM.

I am proposing adding a new operation that implements the LoopLikeInterface here called loop.natural. loop.natural takes a set of operands that map to the block arguments of the entryblock of the region. The first parenthesis is stating the names inside the region and the second parenthesis the arguments. Of course that syntax can be changed to anything people would prefer. Alongside the main loop op I would also introduce two terminators (which may be folded into other terminators if appropriate), loop.natural.next is the terminator for the back-edge and loop.natural.return is the terminator for the loop exit.

 %ci0   = constant 0 : index
 %ci10 = constant 10 : index
 %ci1   = constant 1 : index
 %last = loop.natural (%i) (%ci0 : index) {
         %i1 = addi %i, %ci1: index
         %cond = cmpi %i1, %ci10 : index
         cond_br %cond, ^exiting, ^latch

      ^latch:
          loop.natural.next %i1 : index

      ^exiting:
         loop.natural.return %i1 : index
  } : index

Right now I am primarily interested in representing loops with a single unique exit-block.
In essence these operations embed the results of LLVM LoopInfo analysis pass and my hope is that we can work on extending a LoopLikeInterface that makes it feasible for writing generic loop optimizations. As an example @wsmoses and I added a isGuranteedToExecute predicate to the LoopLikeInterface to enable LICM in the presence of more complicated controlflow.

Q: How does one create such operations, if the given IR is in CFG form?
A: One can use LLVM LoopInfo infrastructure over MLIR blocks to find loops and then use a restructure pass that transforms them into this form. We have an early prototype of that at GitHub - vchuravy/llvm-project at vc/natural_loops

Q: What loop-forms to you plan to represent
A: Right now, loops with a single unique exit block, we have been thinking about loops with several exit blocks and while exit blocks where all exit blocks share the same set of block-arguments could be supported, it starts getting messy since one would then also want to support multiple exit blocks with arbitrary block arguments.

I started writing a doc a couple of month ago with some colleagues, we need to finish it… I think it is relevant to what you’re doing it here as we explore the possibility of keeping a structure in the IR using only custom terminators and no regions.

Here I wonder why we need a “loop.natural” operation, when it really does not handle anything other than identifying the loop body in this case. I’m interested to look into the tradeoff with an alternative representation like:

 %ci0   = constant 0 : index
 %ci10 = constant 10 : index
 %ci1   = constant 1 : index
 loop.natural (%ci0 : index) [^loop_entry, ^loop_exit] // terminator

 ^loop_entry(%i : index):
   %i1 = addi %i, %ci1: index
   %cond = cmpi %i1, %ci10 : index
   cond_br %cond, ^loop_exit(%i1 : index), ^latch

 ^latch:
   br ^loop_entry(%i1 : index)

 ^loop_exit(%last : index):
  ...

You can match the loop (almost) as conveniently by matching the loop.natural operation, except now it is a terminator, and the loop body is contained by the blocks between loop_entry and loop_exit.

Interesting. My main goal was to enable the LoopLikeInterface so that it would be feasible to write common optimization passes. I think that is still possible with your representation, but might require the concept of a implicit/virtual region. e.g. a set of basic-blocks that are associated with the operation.

I see two other challenges, first how do you prevent unrelated optimization passes from breaking the structure the loop.natural op requires. This is similar to how in LLVM the llvm.loop metadata can be discarded accidentally. Secondly we can use LLVM LoopInfo analysis to discover CFG loops and then create loop.natural ops (both variants the one with a region and the one without). The reason why I used new terminators was to prevent LoopInfo from finding loops again. If one maintains the original structure in the loop variant with the region case it looks like a nested loop and one repeated running of the restrucutre/loopannotate pass you would find the already marked loops again and now need to make the decision whether to the old one is still correct or if you are in a nested loop.

A few questions and comments.

  1. For the single exit block case, aren’t most (or all) of the cases where you are interested in such additional information (over a plain CFG) represented by a loop.for (with return values)? The key difference appears to be an arbitrary latch and an arbitrary guard - is this what you want to deal with?

  2. For the multiple exit block case (as well as others that are equivalent to having continue’s in C-style for loops), how much more value and structure does this representation provide over just a CFG? How much easier does it make it to implement transformations like LICM vis-a-vis implementing over the latter representation?

  3. From the answer to your “Q: What loop-forms to you plan to represent”, it looks like you are interested in a subset of natural loops, and not all natural loops. So this isn’t meant to subsume LICM on CFGs but make it powerful on a subset of them? At this point, the higher order information it provides appears pretty thin to me at the expense of not being unable to represent all forms that a CFG can capture.

  4. Reg. your example: the syntax doesn’t look right: why is the guard at the end of the body? The first block appears to be the loop body, and you are always executing it at least once; so this really only captures only do-while loops? I think it’s important to get the syntax clear (at least the high level structure) to see how this information facilitates better analysis/transformation.

There is a key difference: the use of a custom terminator for starting the loop.natural can make use of verifier to ensure that the loop structure is not broken. This not something you can do with LLVM.

I don’t think the verifier can ensure that some block inside the loop.natural doesn’t branch into some successor of ^loop_exit.

Actually, it provides other services:

  • being able to find the parent region in O(1) (finding nearest dominating loop.natural requires walking the domtree potentially all the way to the loop entry)
  • guaranteeing that no value defined inside is used outside

I agree though, it would be nice if we could lean on the CFG structural representation more.

1 Like

I started with the input I tend to see, LLVM itself likes the do-while form better and the frontend I work with translates while-do into a guarded entry to a do-while loop. The natural.loop operation can support either form and the CFG in there can get more complex. If helpful I could write up some more examples.

It basically came for free. The only change we had to make was to teach LICM in MLIR about checking whether a statement isGuaranteedToExecute.

Arbitrary latches and arbitrary guards yes is generally what I am interested in. The hoisting to loop.for might be possible, it requires more analysis during the re-structuring, since you need to determine iteration variable bounds. You might have multiple dependent iteration variables, the CFG within the loop might be more complex, right now the for.loop is a restrained to a single-block.

the limitation right now is not a single exit block, it is a single unique exit block. The difference being that you can have multiple loop.natural.return statements (to implement C-style continue). As long as they all point to the same exit block from which to continue executing.

The limitation exists mostly because I haven’t found a way to properly represent multiple exit blocks within a region op. Thinking about this some more this morning it might be possible to change loop.natural to be a terminator and then

 %ci0   = constant 0 : index
 %ci10 = constant 10 : index
 %ci1   = constant 1 : index
 loop.natural (%i) (%ci0 : index) {
         %i1 = addi %i, %ci1 : index
         %cond = cmpi %i1, %ci10 : index
         cond_br %cond, ^exiting, ^latch

      ^latch:
          loop.natural.next %i1 : index

      ^exiting:
         loop.natural.exit ^exit(%i : index)
  } : ^exit

^exit(%last: index):

But I have to make sure that is implementable in MLIR.