CFG SCCs vs Loops and loop breaking transformations

I ran across an interesting case and wanted to share it. I'm not proposing any particular changes, but the experience seemed interesting to discuss.

First, a bit of background. An LLVM Loop models a specific type of cycle in the CFG. Not all cycles are Loops. Many of our optimization transforms are phrased over loops, which means that a non-loop cycle tends to be less well optimized.

I had some initial IR that had a very complex, oddly written loop. After running this through my pass order, I discovered that there was no longer a Loop representing the loop. One of the transforms - SimplifyCFG is my guess, but I haven't confirmed - had taken something representable as a Loop and converted into a non-Loop SCC. This seems unfortunate and raises a general issue with how we model and canonicalize loops.

Long term, I see a couple of options:
1) Introduce a new notion for SCCs in the CFG, and rephrase select optimizations like LICM over them. A Loop then becomes a particular special case of our more generic SCC concept.
2) Avoid breaking loops until some point late in the optimizer. Essentially, we designate the Loop representable form as being canonical and then lower later.
3) Introduce a SCC to Loop conversion pass which tries to take non-loop SCCs and make them representable as Loops. We probably don't want to go full out here, but catching the "easy" cases might help a lot in practice. Running this at the start of each LoopPassManager pipeline might be an option. I was slightly surprised to notice we didn't already have this.

None of these seem great. Anyone have another idea on how we could approach this?

Philip

p.s. Thankfully, the code I noticed this in is not performance sensitive so I don't need a near term answer here.

It seems like a good idea to track down and understand exactly what is happening here before discussion options. It is entirely possible that the right answer here would be to modify the transformation that is resulting in cycles that are not recognized as loops so that it no longer does so (or alternatively as you say in #2, postpone any transformation that can do this until as late as possible).

Mark

I ran across an interesting case and wanted to share it. I'm not
proposing any particular changes, but the experience seemed interesting
to discuss.

First, a bit of background. An LLVM Loop models a specific type of
cycle in the CFG. Not all cycles are Loops. Many of our optimization
transforms are phrased over loops, which means that a non-loop cycle
tends to be less well optimized.

I had some initial IR that had a very complex, oddly written loop. After
running this through my pass order, I discovered that there was no
longer a Loop representing the loop. One of the transforms -
SimplifyCFG is my guess, but I haven't confirmed - had taken something
representable as a Loop and converted into a non-Loop SCC. This seems
unfortunate and raises a general issue with how we model and
canonicalize loops.

Long term, I see a couple of options:
1) Introduce a new notion for SCCs in the CFG, and rephrase select
optimizations like LICM over them. A Loop then becomes a particular
special case of our more generic SCC concept.
2) Avoid breaking loops until some point late in the optimizer.
Essentially, we designate the Loop representable form as being canonical
and then lower later.
3) Introduce a SCC to Loop conversion pass which tries to take non-loop
SCCs and make them representable as Loops. We probably don't want to go
full out here, but catching the "easy" cases might help a lot in
practice. Running this at the start of each LoopPassManager pipeline
might be an option. I was slightly surprised to notice we didn't
already have this.

None of these seem great. Anyone have another idea on how we could
approach this?

Hi Philip,

how did the loop look like? Could you provide an example? AFAIU LLVM detects all natural loops (cycles with a single entry block that dominates all basic blocks in the cycle). The only cycles that are not natural loops should be irregular control flow. Is this the case for your test case? Or do I miss something?

I would be very surprised if a transformation introduces irregular control flow and would like to understand why this would be needed. I currently work on the assumption that irregular control flow is rare
and likely not worth optimizing. If it is still performance relevant, I would probably just introduce some code duplication to introduce
proper natural loops.

Best,
Tobias

I ran across an interesting case and wanted to share it. I'm not
proposing any particular changes, but the experience seemed interesting
to discuss.

First, a bit of background. An LLVM Loop models a specific type of
cycle in the CFG. Not all cycles are Loops. Many of our optimization
transforms are phrased over loops, which means that a non-loop cycle
tends to be less well optimized.

I had some initial IR that had a very complex, oddly written loop. After
running this through my pass order, I discovered that there was no
longer a Loop representing the loop. One of the transforms -
SimplifyCFG is my guess, but I haven't confirmed - had taken something
representable as a Loop and converted into a non-Loop SCC. This seems
unfortunate and raises a general issue with how we model and
canonicalize loops.

Long term, I see a couple of options:
1) Introduce a new notion for SCCs in the CFG, and rephrase select
optimizations like LICM over them. A Loop then becomes a particular
special case of our more generic SCC concept.
2) Avoid breaking loops until some point late in the optimizer.
Essentially, we designate the Loop representable form as being canonical
and then lower later.
3) Introduce a SCC to Loop conversion pass which tries to take non-loop
SCCs and make them representable as Loops. We probably don't want to go
full out here, but catching the "easy" cases might help a lot in
practice. Running this at the start of each LoopPassManager pipeline
might be an option. I was slightly surprised to notice we didn't
already have this.

None of these seem great. Anyone have another idea on how we could
approach this?

Hi Philip,

how did the loop look like? Could you provide an example? AFAIU LLVM detects all natural loops (cycles with a single entry block that dominates all basic blocks in the cycle). The only cycles that are not natural loops should be irregular control flow. Is this the case for your test case? Or do I miss something?

The particular test case ended up looking something like this:
loop:
   br i1 %c, label %left, label %right

left:
   ;; normal loop stuff here
   br label %left

right:
   ;; do something
   br %left

(Note that that's from memory and I may have gotten something slightly wrong.)

This could easily be converted into a natural loop nest. I suspect most interesting cases could without falling back to the general solution which requires exponential code growth (if I remember correctly).

I would be very surprised if a transformation introduces irregular control flow and would like to understand why this would be needed.

My best guess is either simplify-cfg or jump-threading. The original example was written as a single natural loop, but effectively had another loop inside it. It was using a "reset all variables to starting state and continue" idiom which I suspect one of our passes (correctly) eliminated entirely.

I currently work on the assumption that irregular control flow is rare
and likely not worth optimizing.

I think this is still a reasonable assumption overall.

If it is still performance relevant, I would probably just introduce some code duplication to introduce
proper natural loops.

This is definitely one option. This was what I meant by my original option 3.