scev questions

Hi,

I'm trying to get the following loop to vectorize (simple reduction):

unsigned int sum2(unsigned int *a, int len){
  unsigned int s = 0;
  for (int i = 0; i < len; i += 4)
    s += *a++;
  return s;
}

The loop fails to vectorize because SCEV could not compute the loop exit
count. It appears SCEV cannot handle the non-unit increment of the loop
counter. Is this a known limitation of SCEV or is there something that can
be improved (and if so where should I start looking?)

Thanks,
Paul

Here's the IR for the loop and the SCEV dump:

for.body: ; preds =
%for.body.lr.ph, %for.body
  %i.07 = phi i32 [ 0, %for.body.lr.ph ], [ %add1, %for.body ]
  %s.06 = phi i32 [ 0, %for.body.lr.ph ], [ %add, %for.body ]
  %a.addr.05 = phi i32* [ %a, %for.body.lr.ph ], [ %incdec.ptr, %for.body ]
  %incdec.ptr = getelementptr inbounds i32* %a.addr.05, i64 1
  %0 = load i32* %a.addr.05, align 4, !tbaa !0
  %add = add i32 %0, %s.06
  %add1 = add nsw i32 %i.07, 4
  %cmp = icmp slt i32 %add1, %len
  br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge

Classifying expressions for: @_Z4sum2Pji
%i.07 = phi i32 [ 0, %for.body.lr.ph ], [ %add1, %for.body ]
--> {0,+,4}<nuw><nsw><%for.body> Exits: <<Unknown>>
%s.06 = phi i32 [ 0, %for.body.lr.ph ], [ %add, %for.body ]
--> %s.06 Exits: <<Unknown>>
%a.addr.05 = phi i32* [ %a, %for.body.lr.ph ], [ %incdec.ptr, %for.body ]
--> {%a,+,4}<nw><%for.body> Exits: <<Unknown>>
%incdec.ptr = getelementptr inbounds i32* %a.addr.05, i64 1
--> {(4 + %a),+,4}<nw><%for.body> Exits: <<Unknown>>
%0 = load i32* %a.addr.05, align 4, !tbaa !0
--> %0 Exits: <<Unknown>>
%add = add i32 %0, %s.06
--> (%0 + %s.06) Exits: <<Unknown>>
%add1 = add nsw i32 %i.07, 4
--> {4,+,4}<nuw><nsw><%for.body> Exits: <<Unknown>>
%add.lcssa = phi i32 [ %add, %for.body ]
--> %add.lcssa
%s.0.lcssa = phi i32 [ %add.lcssa, %for.cond.for.end_crit_edge ], [ 0,
%entry ]
--> %s.0.lcssa
Determining loop execution counts for: @_Z4sum2Pji
Loop %for.body: Unpredictable backedge-taken count.
Loop %for.body: Unpredictable max backedge-taken count.

Hi,

I'm trying to get the following loop to vectorize (simple reduction):

unsigned int sum2(unsigned int *a, int len){
  unsigned int s = 0;
  for (int i = 0; i < len; i += 4)
    s += *a++;
  return s;
}

The loop fails to vectorize because SCEV could not compute the loop exit
count. It appears SCEV cannot handle the non-unit increment of the loop
counter. Is this a known limitation of SCEV or is there something that can
be improved (and if so where should I start looking?)

It's a known limitation, see ScalarEvolution.cpp:5568.

The fundamental problem is that len in your example could be (unsigned) -1,
-2 or -3, in which case your loop is infinite.

Nick

Unless I’m missing something, if len is -1 (or otherwise less than 0) the loop has 0 trip count. Did you mean INT_MAX-1, INT_MAX-2 etc? In this case, I believe, the behavior is undefined, because adds are marked with “nsw”.

Eugene

Doh, 's' was unsigned, 'i' was signed. My mistake.

Regardless, this part of SCEV isn't looking for NSW bits yet and
consequently is optimizing as-if the function were written with 'i' and
'len' as unsigned.

But I'm not sure that we should use nsw to fix this, in spite of the fact
that this is exactly what nsw was designed for. We know the answer even for
the unsigned case, and we should just use that. Add a new SCEV node which
lets us return that it's infinite in this case, and trip count X in the
other; along the lines of "SCEVPotentiallyInfinite(m > int_max-3, sdiv(m,
4))".

I've proposed this previously:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120917/151250.html

Nick

I think SCEV should use nsw when possible (that’s exactly what the flag is for, as you’ve said). For unsigned cases, “potentially infinite” sounds reasonable, but transforms will have to be updated to handle it. E.g. for vectorization infinite loop is probably OK to vectorize. For closed form computation a check will have to be inserted.

Another thought is that we can prove this particular loop is finite even if i was unsigned. It has *a++ in the body. If a is infinitely incremented the dereference will, at some point, cause undefined behavior.

Eugene