Loop unroller fails to unroll loop

singh-yashwant · April 7, 2023, 10:27am

Hi, I have run into a loop optimization problem(code linked at the end). I have a function(“gpu_kernel”) that calls another function(“advance”) containing a loop with '#pragma unroll’ set. The loop exit condition is a function parameter but a compile time constant (10 and 12 for both calls respectively) hence the compiler should have flattened the loop completely but we get 2 partially unrolled loops.
I found that 2 instances of FullLoopUnroll pass are being called on (once before the function is inlined and once after). Before inlining FullLoopUnroll can’t make out the TripCount variable (LoopUnrollPass.cpp:1231) and ends up partially unrolling the loop while also setting “llvm.loop.unroll.disable” metadata hence disabling it for further attempts by loop unroller.

However this is only true when “pragma unroll” is set. When it’s removed the first attempt at FullLoopUnroll bails out ( LoopUnrollPass.cpp:computeUnrollCount returns false). After inlining, the loop exit condition which was a function parameter is seen as constant and the Unroller fully unrolls the loop.

I don’t see the full picture hence I’m failing to understand why FullLoopUnroller ran before inliner? Consuming compile time as well as doing sub-optimal optimization, at least in this case. What changes can be made to fix this issue?

Reproducer: https://godbolt.org/z/16M1Ps1sK

nikic · April 7, 2023, 3:52pm

Assuming your description is correct, this is a bug in the unroller. The entire point of the LoopFullUnrollPass is that it only performs full unrolling (and peeling), but not any kind of runtime/partial unrolling. It sounds like the presence of unroll metadata (from the pragma) forces unrolling in LoopFullUnrollPass even though no full unroll is possible. Instead, this should be delayed to the runtime unroller.

singh-yashwant · April 10, 2023, 3:25pm

LoopFullUnrollPass is doing runtime unrolling forced by ‘pragma unroll’. As both LoopUnrollPass and FullUnrollPass rely on the same unroll function I’m not entirely sure how it can be solved. I made some changes locally that fixed it by introducing a new parameter to tryToUnrollLoop() [link]. @nikic what do you think about the changes?

nikic · April 10, 2023, 3:41pm

Ignoring style nits, the approach looks reasonable to me.

singh-yashwant · April 10, 2023, 3:44pm

Great! I’ll submit it for review with some tests.

singh-yashwant · April 12, 2023, 7:35am

Submitted a patch for review D148071. I’m not confident that I have added the right reviewers. @nikic can you add relevant reviewers if I missed them?

fhahn · April 23, 2024, 7:10pm

Unfortunately this has exposed some regression in some of our workloads, due to unrolling later means we miss some CSE for the partially unrolled loops.

I put up a patch to add a lightweight CSE as part of the partial unrolling ([LoopUnroll] Add CSE to remove redundant loads after unrolling. by fhahn · Pull Request #83860 · llvm/llvm-project · GitHub), I’d appreciate if people could take a look

Topic		Replies	Views
question about llvm partial unrolling/runtime unrolling LLVM Dev List Archives	5	172	October 16, 2015
Loop unroll error LLVM Dev List Archives	0	95	September 16, 2017
Failed to Unroll a Seemingly Simple Loop LLVM Dev List Archives	3	154	June 23, 2014
Strange loop unrolling problem LLVM Dev List Archives	5	101	May 4, 2009
Partial loop unrolling LLVM Dev List Archives	5	169	July 17, 2014

Loop unroller fails to unroll loop

Related topics