LoopSimplify pass prevents loop unrolling

Hi All,

In the attached test case there, is an unnested loop with 2 iterations. The loop latch block is terminated by an unconditional branch, so simplifycfg folds the almost empty latch block into its predecessor which is the loop header. This results in an additional backedge in the CFG, so when LoopRotate pass is called it canonicalizes the loop into a nested loop. However, now the loop trip count is unpredictable as the BackedgeTakenCount for the outer loop is not loop invariant. As a result the loop cannot be unrolled. Is this the intended canonicalization for this loop or is the loopsimplify canonicalizing incorrectly? Should simplifycfg skip folding the latch block into the loop header if this results in additional backedges and let the empty blocks be folded during CGP? More details in https://bugs.llvm.org/show_bug.cgi?id=33605.

FWIW, this prevents unrolling of a hot loop in spec2017/gcc and also prevents loop-interleave of a loop in spec2017/perlbench. Appreciate any suggestions on how to fix this.

Attached testcase:

$cat test.c

void foo();

bool test(int a, int b, int *c) {

bool changed = false;

for (unsigned int i = 2; i–:wink: {

int r = a | b;

if ( r != c[i]) {

c[i] = r;

foo();

changed = true;

}

}

return changed;

}

Thanks,

Balaram

Edit. Predecessor → successor.

We have code that’s supposed to prevent this from happening; see . Maybe it also needs to check whether the destination of the branch is a loop header? -Eli

Thanks Eli,

I was looking at this code which keeps track of loop headers but is checking if the destination of branch is a loop header sufficient? This prevents merging empty preheaders into the loop headers as well. Is that reasonable approach or do we need to skip only if the original unconditional branch was a backedge and folding this branch might result in additional backedges? I made a quick hack to find the function backedges and skip simplifycfg to merge the latch block into the loop header when it results in an additional backedge. Although it solves my purpose I am not sure if this the right approach, as finding the backedges looks expensive. I also found a regression with this patch where a huge switch statement with multiple empty blocks have been skipped from merging resulting in bad code. Instead, should loopsimplify try to unify multiple exit blocks and collapse multiple backedges whenever possible instead of splitting it out into a nested loop?

-Balaram

There isn’t really any reason to collapse preheaders anyway; LoopSimplify will recreate them, and they don’t really block other optimizations as far as I know. Well, not that expensive to calculate if you cache it, but probably tricky to keep the cache up-to-date, yes. I’m not sure I follow the issue here. Could you give an example? -Eli

I will try to reduce a test case for the regression I found and will update.

Thanks,

Balaram

Sorry for long trip time on this. The issue I was seeing when we defer simplifying unconditional branches to later passes in backend was due to a heuristic in CGP:: isMergingEmptyBlockProfitable where we assumed cost of branch is same as cost of copy. However, if the destination block is almost empty latch block, then we can hoist the jump through the backedge, so it is profitable to eliminate the branch. I have posted a patch for review here: https://reviews.llvm.org/D35411.

Please take a look and let me know your comments/suggestions.

Thanks,

Balaram