Hi, I have run into a loop optimization problem(code linked at the end). I have a function(“gpu_kernel”) that calls another function(“advance”) containing a loop with '#pragma unroll’ set. The loop exit condition is a function parameter but a compile time constant (10 and 12 for both calls respectively) hence the compiler should have flattened the loop completely but we get 2 partially unrolled loops.
I found that 2 instances of FullLoopUnroll pass are being called on (once before the function is inlined and once after). Before inlining FullLoopUnroll can’t make out the TripCount variable (LoopUnrollPass.cpp:1231) and ends up partially unrolling the loop while also setting “llvm.loop.unroll.disable” metadata hence disabling it for further attempts by loop unroller.
However this is only true when “pragma unroll” is set. When it’s removed the first attempt at FullLoopUnroll bails out ( LoopUnrollPass.cpp:computeUnrollCount returns false). After inlining, the loop exit condition which was a function parameter is seen as constant and the Unroller fully unrolls the loop.
I don’t see the full picture hence I’m failing to understand why FullLoopUnroller ran before inliner? Consuming compile time as well as doing sub-optimal optimization, at least in this case. What changes can be made to fix this issue?
Reproducer: https://godbolt.org/z/16M1Ps1sK