SimplifyCFG vs loops

Hello All,

The current implementation of the CFG simplification is loop-agnostic. In the past I have observed that it can perform transformations that can be detrimental to the loop structure. One example that I recall, is converting a loop like this:
    while (...) {
       ...
       if (cond) continue;
       ...
    }
into two nested loops. Specifically, the "continue" branch would go back to the loop header making it appear as if there were two back edges.

What are your thoughts about either making the CFG simplification aware of loops in its entirety, or separating the "safe" transformations in a quick pass, and having a more aggressive pass that does preserve loop structure?

-Krzysztof

Hello All,

The current implementation of the CFG simplification is loop-agnostic. In the past I have observed that it can perform transformations that can be detrimental to the loop structure. One example that I recall, is converting a loop like this:
while (…) {

if (cond) continue;

}
into two nested loops. Specifically, the “continue” branch would go back to the loop header making it appear as if there were two back edges.

Well, there really are two backedges there.

What are your thoughts about either making the CFG simplification aware of loops in its entirety, or separating the “safe” transformations in a quick pass, and having a more aggressive pass that does preserve loop structure?

It’s not clear what the problem is.

Dan

There is no reason for two back edges. This loop could look like this:

header:
   ...
   if (cond) goto tail; else goto next;
next:
   ...
tail:
   if (loop-condition) goto header;

This way there is one loop with some control flow nested in it, instead of something that looks like two loops.

What problem does that solve? LLVM's concept of loop is tied to the loop header block: this both forms are a single loop, not two nested loops.

-Chris

There are two back-edges that both go to the same header. It was a while ago, but I saw some procedure that tried to eliminate such cases by separating the two back edges into their own loops. We ended up with two loops instead of only having one.

I'd have to dig up the old testcase I working on, but the code there was getting unnecessarily complicated due to the fact that SimplifyCFG was making changes that eventually hurt loop optimizations.

A quick search revealed that it was something in Automark in EEMBC. We ended up not unrolling a loop that could have been unrolled (in principle, at least).

Having two back edges does not have an adverse affect on optimisation.
Do you have a specific optimisation that you cannot do if there are
two back edges?
I thing to keep in mind is this is CFG, and can be unstructured, or
after various transformations, not representable in higher level
language structures such as while loops.

Kind Regards

James

Having a specific loop structure with specific requirements, such as single back edge can make loop optimizations (and loop nest optimizations in particular) a lot simpler (or even possible).

I don't have the specific case right now (I worked on it a while ago), but it was a case of loop unrolling giving up after SimplifyCFG.

Another thing to keep in mind is that a compiler can shoot itself in the foot really easily, by performing early optimizations that hide opportunities for later transformations. This is my main concern here.

I believe that is the purpose behind loop-simplify, which most loop optimizations require.

I actually submitted a patch to fix this very problem quite a while back, but I don't think it was ever added to the baseline:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110711/124136.html

I have also explained why it is important to avoid the creation of new loops in an optimizer that performs analysis on the looping structure.

I now need to keep this patch updated whenever we upgrade LLVM, to ensure that our optimizer (which is now in production) works correctly. Would it be possible to add this patch to the head? I have a version that works with 3.0 but have not yet updated to 3.1 or the head.

Andrew

I actually submitted a patch to fix this very problem quite a while back,
but I don't think it was ever added to the baseline:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110711/124136.html

I have also explained why it is important to avoid the creation of new loops
in an optimizer that performs analysis on the looping structure.

I now need to keep this patch updated whenever we upgrade LLVM, to ensure
that our optimizer (which is now in production) works correctly. Would it
be possible to add this patch to the head? I have a version that works with
3.0 but have not yet updated to 3.1 or the head.

I'm not quite up for diving all the way into the previous thread, but
do you have a test case that demonstrates a missed optimization due to
this transformational problem? That should hopefully be enough to
convince people the change is valuable. (I assume this is causing you
real problems/missed optimizations or you wouldn't bother keeping it
for your out of tree releases)

I think part of the difficulty in explaining the benefit of this change is that the optimization pass where this change is of value to us is also a custom analysis pass. Basically, we're using LLVM as an optimization framework for a SIMD language, and the difficulty with the unwanted loop transformation comes when we have code like this:

while (uniform_and_expensive_condition_with_side_effects())
{
     if (cheap_varying_condition())
         do_this
}

If this gets transformed to something like this (which is similar to the example I posted in the earlier thread):

while () // outer loop
{
      while () // inner loop
      {
          if (!uniform_and_expensive_condition_with_side_effects())
             return;
          if (cheap_varying_condition())
             break;
      }
     do_this
}

If you think of "uniform_and_expensive_condition_with_side_effects()" as a function that is efficient only when it is run in lockstep for the SIMD array, and "cheap_varying_condition()" as a function that can be run efficiently with conditional masking, the transformation is detrimental for performance because "uniform_and_expensive_condition_with_side_effects()" now needs to both work with conditional masking and have an implementation that does this efficiently. In practice, we have some functions that don't work at all unless they are executed "loop uniformly" as in the source loop, so changing the CFG in this way breaks the optimized code.

I know this explanation may be a little difficult to follow, especially if you are thinking of LLVM entirely in the context of optimizing serial programs for scalar architectures. So I have offered these other reasons to add the patch:

1) Conditional logic is naturally simpler than loops, so a pass designed to simplify the loop structure should not introduce more complexity in the form of additional loops when the CFG is just representing control flow
2) The patch doesn't degrade the assembly, at least in the most recent test that I performed for Cameron I think it was exactly the same instruction count or slightly better

Andrew

What was the resolution to this issue? Was the patch accepted?

I am also running into an issue with this. Basically, I have a fairly simple loop with control flow, and the loopSimplify pass is turning it into a nested loop in order to remove multiple backedges. This is making the loop more complicated, not simple, and is causing our backend to miss performance opportunities.

In particular, I have the following loop:

While (n)
{
   If (c)
      A
   Else
      B
}

Normally I would expect to rotate the loop termination test to the end of the loop, and then our backend would if-convert blocks A and B and then effectively software pipeline the entire loop.

However, because the loop has two backedges (from A and from B), the LoopSimplify pass will create a nested loop, which I awkwardly write in C as:

While (n)
{
   While (n)
   {
      If (c)
         A
      Else
         Goto L;
   }
   Break;
L:
   B
}

Now, the original loop shape is obscured and the backend no longer sees this as an if-then-else if-conversion opportunity, and software pipelining does not occur.

I have not looked at the patch, but I'm also afraid that the patch may not handle this case, as the other examples I've seen in the discussion only have if-then's, not if-then-else's. Will the patch work on the example above?

If not, this issue may simply be an optimization ordering dilemma in balancing micro optimizations (presumably the reason block's A and B don't branch to a common block that then forms a single backedge is to save a branch instruction) vs clear loop shapes that enable optimizations downstream.

Couple of thoughts:
- how firm is the constraint that all loops must have a single backedge? I don't know how fundamental this assumption is made in LLVM code. This really causes some common loops to have a funky shape.
- can we delay whatever transformation is removing the common successor block of A and B, possibly until the backend after which we've had a chance to perform other important optimizations
- is there a tell-tale sign with the loop structure that would let us predict when to form nested loops vs adding a common loop latch? Or should we always add a common loop latch, and then like above, assume that the backend cleans up any unneeded branches.
- I could add a pass that would 'undo' the nested loop. On principle, I don't like doing that type of thing, because it's really just dodging the problem and would be problematic to determine when to undo.

I appreciate your thoughts.

Jon

On my side, I am still swamped with my "regularly scheduled programming" and so I wasn't able to prepare a convincing testcase to show the detrimental effects of SimplifyCFG. If someone else has such a testcase it would be great.

We should agree on a specific loop structure that will be maintained throughout the entire optimization phase (at least on the bitcode level). That would both, simplify loop optimizations, and enable optimizations that would otherwise have their opportunities obscured.

-Krzysztof

Loops that don't have a single back-edge are not easily optimizable. There is no single point that decides on whether the loop continues to iterate or not. Other than introducing an edge going from outside into the middle of the loop, this is likely one of the worst loop pessimizations.

-Krzysztof