Canonicalize induction variables

I just subscribed this group. This is my first time to post a question (not sure if this is a right place for discussion) after I have a brief look at LLVM OPT (dev trunk). I would expect loop simplification and induction variable canonicalization pass (IndVarSimplify pass) should be able to convert the following loops into a simple canonical form, i.e., there is a canonical induction variable which starts at zero and steps by one, getCanonicalInductionVariable() returns the first PHI node in the loop header block.

int test1 (int x, int k, int s) {
int sum = 0;
for (int i = 0; i < k; i+=s) {
sum += x[i];
}
return sum;
}

int test2(int x, int k, int s) {
int sum = 0;
for (int i = k; i > 0; i–) {
sum += x[i];
}
return sum;
}

Anyone can help explain why the current LLVM cannot canonicalize induction variables for the above loops (by design or a limitation to be fixed in the future)? Thanks.

Does the fact that loop vectorizer supports arbitrary constant step size suggest that this is by design?
https://reviews.llvm.org/rL227557

Apparently an older version of LLVM, canonicalized induction variables as described above.
http://llvm.org/releases/2.6/docs/Passes.html#indvars

Not sure whether these are the actual reasons, but to explain the
difficulties with those loops.

I just subscribed this group. This is my first time to post a question (not
sure if this is a right place for discussion) after I have a brief look at
LLVM OPT (dev trunk). I would expect loop simplification and induction
variable canonicalization pass (IndVarSimplify pass) should be able to
convert the following loops into a simple canonical form, i.e., there is a
canonical induction variable which starts at zero and steps by one,
getCanonicalInductionVariable() returns the first PHI node in the loop
header block.

int test1 (int x, int k, int s) {
  int sum = 0;
  for (int i = 0; i < k; i+=s) {
    sum += x[i];
  }
  return sum;
}

s could be zero making this an endless loop (C has some rules saying
that it can assume that certain loops do terminate, but I think it
does not apply to LLVM IR)

int test2(int x, int k, int s) {
  int sum = 0;
  for (int i = k; i > 0; i--) {
    sum += x[i];
  }
  return sum;
}

with k = INT_MIN where is no upper limit in that range. Neither

for (int j = 0; j < -INT_MIN; j++)

nor

for (int j = 0; j <= INT_MAX; j++)

do work here.

Anyone can help explain why the current LLVM cannot canonicalize induction
variables for the above loops (by design or a limitation to be fixed in the
future)? Thanks.

The first could be tackled with loop versioning of the s==0 case. The
second might be converted to

for (int j = -1; j < -(k+1); j++)

although this isn't the canonical form.

Michael

But even for a very simple loop:

int test1 (int *x, int *y, int *z, int k) {
int sum = 0;
for (int i = 10; i < k; i++) {
z[i] = x[i] / y[i];
}
return sum;
}

The initial value of induction variable is not zero after compiling with -O3 -mcpu=power8 x.cpp -S -c -emit-llvm -fno-unroll-loops (see bottom of the email for IR)

Also I can write somewhat more complicated loop where step size is a constant > 1, and the conditions are so that IV will not overflow:

int test2 (int *x, int *y, int *z, int k) {
int sum = 0;
for (int i = 10; i < k && i < k-5; i+=5) {
z[i] = x[i] / y[i];
}
return sum;
}

again this is not canonicalized in the above sense (see IR at the end of the email). Maybe this condition is too complicated?

IR for test1

for.body: ; preds = %for.body.preheader, %for.body
%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 10, %for.body.preheader ]
%arrayidx = getelementptr inbounds i32, i32* %x, i64 %indvars.iv
%0 = load i32, i32* %arrayidx, align 4, !tbaa !1
%arrayidx2 = getelementptr inbounds i32, i32* %y, i64 %indvars.iv
%1 = load i32, i32* %arrayidx2, align 4, !tbaa !1
%div = sdiv i32 %0, %1
%arrayidx4 = getelementptr inbounds i32, i32* %z, i64 %indvars.iv
store i32 %div, i32* %arrayidx4, align 4, !tbaa !1
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
br i1 %exitcond, label %for.cond.cleanup, label %for.body

IR for test2

for.body: ; preds = %for.body.preheader, %for.body
%indvars.iv = phi i64 [ 10, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
%arrayidx = getelementptr inbounds i32, i32* %x, i64 %indvars.iv
%2 = load i32, i32* %arrayidx, align 4, !tbaa !1
%arrayidx3 = getelementptr inbounds i32, i32* %y, i64 %indvars.iv
%3 = load i32, i32* %arrayidx3, align 4, !tbaa !1
%div = sdiv i32 %2, %3
%arrayidx5 = getelementptr inbounds i32, i32* %z, i64 %indvars.iv
store i32 %div, i32* %arrayidx5, align 4, !tbaa !1
%indvars.iv.next = add nuw i64 %indvars.iv, 5
%cmp = icmp slt i64 %indvars.iv.next, %1
%cmp1 = icmp slt i64 %indvars.iv.next, %0
%or.cond = and i1 %cmp, %cmp1
br i1 %or.cond, label %for.body, label %for.cond.cleanup.loopexit

Hi,

I have a "back to square one question -- why do you care about
having a canonical induction variable?

  - If it is for easier analysis, a somewhat better approach is to lean
    on SCEV to "understand" induction variables as much as possible.

  - If it is for better codegen, then this should be solved in
    LoopStrengthReduce.

That is not to say you're wrong in demanding this of LLVM, but I'd
like to understand your motivation a little better.

-- Sanjoy

Hi again,

I looked into the code of IndVarSimplify but found no code that
changes the start value of the induction variable. It only modifies
the exit test.

However, the file's comment gives an example:

// 1. The exit condition for the loop is canonicalized to compare the
// induction value against the exit value. This turns loops like:
// 'for (i = 7; i*i < 1000; ++i)' into 'for (i = 0; i != 25; ++i)'

I ran the example and it turns the loop into
for (i = 7; i != 32; ++i)

The part that makes induction variables start at zero may have been removed.

Michael

[+CC llvm-dev]

Hi,
Regarding the question about "why do you care about having a canonical
induction variable", we expect it can simplify loop transformations without
redundant analysis. Some loop (nest) transformations can be skipped earlier
if they don't have a canonical induction variable.

Can you be more specific here (about the transforms and analyses
that you expect to be able to simplify)?

I'm asking because while it won't be super difficult to make indvars
turn as many
induction variables into canonical induction variables, I'd like to understand
the problem you're trying to solve first.

-- Sanjoy

Also, just as a historical note. indvars was at some point able to do
this, but this "feature" has been removed as it introduced performance
regressions on hand optimized code that have been very difficult to
avoid in the strenght reduce phase.

Best,
Tobias