Hi,
I am trying to do loop unrolling with loops that don’t have constant loop counter. It is highly appreciated if anyone can help me on this.
What I want to do is to turn
loop (n)
{
}
into
loop (n/4)
{
}
loop (n%4)
{
}
I set allowPartial and Runtime to both 1 ( llvm::createLoopUnrollPass(Threshold, count, 1, 1) )
Also overload the UnrollingPreferences structure to gives values to all the Partial* members, but the loop still doesn’t unroll.
The unrolling process hits this code in LoopUnrollRuntime.cpp
// Only unroll loops with a computable trip count and the trip count needs
// to be an int value (allowing a pointer type is a TODO item)
const SCEV *BECountSC = SE->getBackedgeTakenCount(L);
if (isa(BECountSC) ||
!BECountSC->getType()->isIntegerTy())
return false;
BECountSC=0xcccccccc and returns false here.
Based on the comments it looks like I still need a constant loop counter. Is there a way to unroll with non-constant loop counter as in the example above?
Thanks,
Frances
From: "Frances Tzeng via llvm-dev" <llvm-dev@lists.llvm.org>
To: llvm-dev@lists.llvm.org
Sent: Monday, October 12, 2015 6:13:25 PM
Subject: [llvm-dev] question about llvm partial unrolling/runtime
unrolling
Hi,
I am trying to do loop unrolling with loops that don't have constant
loop counter. It is highly appreciated if anyone can help me on
this.
What I want to do is to turn
loop (n)
{
<loop body>
}
into
loop (n/4)
{
<loop body>
<loop body>
<loop body>
<loop body>
}
loop (n%4)
{
<loop body>
}
I set allowPartial and Runtime to both 1 (
llvm::createLoopUnrollPass(Threshold, count, 1, 1) )
Also overload the UnrollingPreferences structure to gives values to
all the Partial* members, but the loop still doesn't unroll.
The unrolling process hits this code in LoopUnrollRuntime.cpp
// Only unroll loops with a computable trip count and the trip count
needs
// to be an int value (allowing a pointer type is a TODO item)
const SCEV *BECountSC = SE->getBackedgeTakenCount(L);
if (isa<SCEVCouldNotCompute>(BECountSC) ||
!BECountSC->getType()->isIntegerTy())
return false;
BECountSC=0xcccccccc and returns false here.
Based on the comments it looks like I still need a constant loop
counter. Is there a way to unroll with non-constant loop counter as
in the example above?
Computable is not the same as constant. With runtime loop unrolling enabled, you can certainly unroll a loop with a runtime trip count. If you run with -debug=loop-unroll, what does it say regarding your loop?
-Hal
Hi Hal,
I did
opt.exe -S -debug -loop-unroll -unroll-runtime=true -unroll-count=4 csShader.ll
and it prints out:
Args: opt.exe -S -debug -loop-unroll -unroll-runtime=true -unroll-count=4 csShader.ll
Loop Unroll: F[build_cs_5_0] Loop %loop_entry
Loop Size = 82
partially unrolling with count: 1
From: "Frances Tzeng" <francestzeng@gmail.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: llvm-dev@lists.llvm.org
Sent: Friday, October 16, 2015 11:19:25 AM
Subject: Re: [llvm-dev] question about llvm partial unrolling/runtime unrolling
Hi Hal,
I did
opt.exe -S -debug -loop-unroll -unroll-runtime=true -unroll-count=4
csShader.ll
and it prints out:
Args: opt.exe -S -debug -loop-unroll -unroll-runtime=true
-unroll-count=4 csShader.ll
Loop Unroll: F[build_cs_5_0] Loop %loop_entry
Loop Size = 82
partially unrolling with count: 1
Why does it compute a count of 1? If you use a debugger (and/or insert some additional print statements) around here in lib/Transforms/Scalar/LoopUnrollPass.cpp you should be able to figure it out:
} else if (Unrolling == Runtime) {
...
// Reduce unroll count to be the largest power-of-two factor of
// the original count which satisfies the threshold limit.
while (Count != 0 && UnrolledSize > PartialThreshold) {
Count >>= 1;
UnrolledSize = (LoopSize-2) * Count + 2;
}
if (Count > UP.MaxCount)
Count = UP.MaxCount;
DEBUG(dbgs() << " partially unrolling with count: " << Count << "\n");
}
-Hal
Hi Hal,
Thanks for the response. I think I found the reason. ( not the debug message above)
My loop count is from -n to n, and it fails the “isa(TripCountSC)” check and exit.
Thanks,
Frances
Hi Frances,
Have you tried running your IR through the standard -O3 optimization pipeline? We can certainly handle such loops in general:
$ cat /tmp/l.c
void foo(float *a, int n) {
for (int i = -n; i <= n; ++i)
a[i] += 1;
}
$ clang -O3 -S -emit-llvm -o - /tmp/l.c -fno-vectorize -fno-unroll-loops > /tmp/l.ll
$ opt -analyze -scalar-evolution < /tmp/l.ll | grep 'backedge-taken count'
Loop %for.body: backedge-taken count is ((-1 * (sext i32 (-1 * %n) to i64))<nsw> + ((sext i32 (-1 * %n) to i64) smax (sext i32 %n to i64)))
Loop %for.body: max backedge-taken count is 4294967295
And, as you can see, we've computed an expression for the trip count. Can you figure out how your case differs from this?
-Hal