How to unroll loop with non-constant boundary

Dear LLVM,

Consider two loops with one interation -
First with constant lower bound, second with usual non-constant lower bound:

int main(int argc, char ** argv)
{
int numOfIterations= 1;
int stride=1;
int lowerBound = 1000; - 1st | int lowerBound = argc; - 2nd
int upperBound = lowerBound + (numOfIterations - 1)*stride;

int i = lowerBound;

int s = 0;
while(i <= upperBound)
{
s += i;
i++;
}
return s;
}
After the application of -O3 optimization:

The first loop was unrolled:

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind uwtable readnone {
entry:
ret i32 1000
}

But the second was not:

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind uwtable readnone {
entry:
br label %“3”

“3”: ; preds = %entry, %“3”
%0 = phi i32 [ 0, %entry ], [ %2, %“3” ]
%1 = phi i32 [ %argc, %entry ], [ %3, %“3” ]
%2 = add i32 %0, %1
%3 = add i32 %1, 1
%4 = icmp sgt i32 %3, %argc
br i1 %4, label %“5”, label %“3”

“5”: ; preds = %“3”
ret i32 %2
}

The questions:
Is it possible to unroll loop with non-constant boundary using standard LLVM 3.1 facilities, and, if it is, how I can do that?

Best regards,
Nicolas

Hi Nicolas,

LLVM misses this optimization because ScalarEvolution's ComputeExitLimitFromICmp doesn't handle signed <= (SLE) and thus can't compute the number of times the loop is executed. I wonder if there's a reason for this, it seems like something simple to add.

Attached is a completely untested patch that adds the missing logic.

SCEV-compare-equal.patch (2.05 KB)

Off the top of my head, the attached is missing a check for whether
the increment is nsw, and the upper bound computation is wrong.

-Eli

No doubt, it was just a test if this would work at all.

Looking deeper into SCEV, this should really be canonicalized in SimplifyICmpOperands. Currently it checks whether the value cannot possibly be the maximum or minimum signed value and only does the transform if that's the case.

I wonder if that condition could be weakened a bit.

- Ben

Hi Benjamin,

LLVM misses this optimization because ScalarEvolution's ComputeExitLimitFromICmp doesn't handle signed<= (SLE) and thus can't compute the number of times the loop is executed. I wonder if there's a reason for this, it seems like something simple to add.

instsimplify could also be enhanced to clean it up in this particular case, but
it would be better to make scev smarter.

Ciao, Duncan.

2012/2/27 Benjamin Kramer <benny.kra@googlemail.com>

Dear LLVM,

Consider two loops with one interation -
First with constant lower bound, second with usual non-constant lower bound:

int main(int argc, char ** argv)
{
int numOfIterations= 1;
int stride=1;
int lowerBound = 1000; - 1st | int lowerBound = argc; - 2nd
int upperBound = lowerBound + (numOfIterations - 1)*stride;

int i = lowerBound;

int s = 0;
while(i <= upperBound)
{
s += i;
i++;
}
return s;
}
After the application of -O3 optimization:

The first loop was unrolled:

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind uwtable readnone {
entry:
ret i32 1000
}

But the second was not:

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind uwtable readnone {
entry:
br label %“3”

“3”: ; preds = %entry, %“3”
%0 = phi i32 [ 0, %entry ], [ %2, %“3” ]
%1 = phi i32 [ %argc, %entry ], [ %3, %“3” ]
%2 = add i32 %0, %1
%3 = add i32 %1, 1
%4 = icmp sgt i32 %3, %argc
br i1 %4, label %“5”, label %“3”

“5”: ; preds = %“3”
ret i32 %2
}

The questions:
Is it possible to unroll loop with non-constant boundary using standard LLVM 3.1 facilities, and, if it is, how I can do that?

Hi Nicolas,

LLVM misses this optimization because ScalarEvolution’s ComputeExitLimitFromICmp doesn’t handle signed <= (SLE) and thus can’t compute the number of times the loop is executed. I wonder if there’s a reason for this, it seems like something simple to add.

Attached is a completely untested patch that adds the missing logic.

Off the top of my head, the attached is missing a check for whether
the increment is nsw, and the upper bound computation is wrong.

No doubt, it was just a test if this would work at all.

Looking deeper into SCEV, this should really be canonicalized in SimplifyICmpOperands. Currently it checks whether the value cannot possibly be the maximum or minimum signed value and only does the transform if that’s the case.

I wonder if that condition could be weakened a bit.

  • Ben

Hi, Benjamin!
Thank you for your answer, with your patch it works better.
But I still have a problem - loops are unrolled only with stride equal one.
For example,
{
int i = lowerBound;
int s = 0;
while(i <= upperBound)
{
s += i;
i+=32;
}
} is not urolled!
At first, I think that reason is at condition
if (isSigned) {
APInt Max = APInt::getSignedMaxValue(BitWidth);
if ((Max - getSignedRange(getMinusSCEV(Step, One)).getSignedMax())
.slt(getSignedRange(RHS).getSignedMax()))
return getCouldNotCompute();
}
in “ScalarEvolution::ExitLimit ScalarEvolution::HowManyLessThans(const SCEV *LHS, const SCEV *RHS,
const Loop *L, bool isSigned)”,
which, In my understanding, is performed for all LHS and RHS with same bit size and not equal one Step, so function returns getCouldNotCompute(), please, correct me, if it is not.
But next, I change type of i to int64_t, that condition become false, function continued, but loop still was not unrolled.
So, the questions:

  1. How to overcome upper condition for LHS and RHS with same bit size and not equal one stride?
  2. What more prevents the calculation of number of interations and how to deal with it?

Best regards,
Nicolas

Hi Benjamin,

LLVM misses this optimization because ScalarEvolution's ComputeExitLimitFromICmp doesn't handle signed<= (SLE) and thus can't compute the number of times the loop is executed. I wonder if there's a reason for this, it seems like something simple to add.

instsimplify could also be enhanced to clean it up in this particular case, but
it would be better to make scev smarter.

I filed http://llvm.org/bugs/show_bug.cgi?id=12110 to track this.

- Ben

Hi guys,
I attached the modified patch that handles cases with low==end and stride!=1.
Please find it for review.

-Stepan

pr12110.patch (5.23 KB)

Stepan Dyatkovskiy wrote:

Hi guys,
I attached the modified patch that handles cases with low==end and stride!=1.

I don't see how this could be correct. Your patch treats 'X s<= Y' as 'X s< Y+1', which is incorrect when Y is INT_MAX. Wouldn't that turn an infinite loop into a zero-trip loop?

To be clear, this is the flaw in Benjamin's patch which you appear to have extended that makes it unsuitable for committing to the compiler. It would cause miscompiles.

That said, I haven't had the chance to look into fixing this, but wouldn't testing AddRec for nsw/nuw be an easy fix that would salvage the rest of the logic?

Nick

I treat "X s<= Y + Step" as "X s<= Y + 1". If X == Y (even if Y == INT_MAX) only two cases are possible:
1. Step == 0. Means infinite loop. Could not be unrolled.
2. Step != 0. Loop with only iteration. May be unrolled.

I forgot to fix comments "X s<= Y+1" with the "X s<= Y+Step". Sorry. Fixed.

-Stepan.

pr12110.patch (5.24 KB)