SCEV and GEP NSW flag

Andy, et al.,

If I start with C code like this:

void foo(long k, int * restrict a, int * restrict b, int * restrict c) {
  if (k > 0) {
    for (int i = 0; i < 2047; i++) {
      a[i] = a[i + k] + b[i] * c[i];
    }
  }
}

Clang -O3 will produce code like this:

entry:
  %cmp = icmp sgt i64 %k, 0
  br i1 %cmp, label %for.body.preheader, label %if.end

for.body.preheader:
  br label %for.body

for.body:
  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ]
  %add = add nsw i64 %indvars.iv, %k
  %arrayidx = getelementptr inbounds i32* %a, i64 %add
  %0 = load i32* %arrayidx, align 4, !tbaa !1
...
  %arrayidx7 = getelementptr inbounds i32* %a, i64 %indvars.iv
  store i32 %add5, i32* %arrayidx7, align 4, !tbaa !1
...

My goal here is to detect that, within the loop, (&a[i] - &a[i + k]) is negative, as the explicit loop guard guarantees. On the path toward that goal, I've run into the following:

The SCEV for &a[i] looks like this: {%a,+,4}<nsw><%for.body>

but the SCEV for &a[i + k] looks like this: {((4 * %k) + %a),+,4}<%for.body>

and the problem is that this expression, even though it comes from a inbounds GEP, with nsw on all contributing instructions, does not have the <nsw> flag. I think that the problem is the (4 * %k) 'Index' expression, which lacks the no-wrap flags, and when createNodeForGEP uses it to construct the overall expression for the address, the result also lacks the nsw flag.

The reason that the (4 * %k) 'Index' expression lacks the flags is explained in createSCEV:

    // Don't apply this instruction's NSW or NUW flags to the new
    // expression. The instruction may be guarded by control flow that the
    // no-wrap behavior depends on. Non-control-equivalent instructions can be
    // mapped to the same SCEV expression, and it would be incorrect to transfer
    // NSW/NUW semantics to those operations.

and I appreciate this concern, but I suspect it is supposed to be possible to recover the flags on the resulting AddRec, and how is that supposed to happen? One issue may be that the control flow mentioned in the comment could be in the loop, although in this one-BB loop with no calls, I think that can be ruled out.

Thanks again,
Hal

Andy, et al.,

If I start with C code like this:

void foo(long k, int * restrict a, int * restrict b, int * restrict c) {
if (k > 0) {
   for (int i = 0; i < 2047; i++) {
     a[i] = a[i + k] + b[i] * c[i];
   }
}
}

Clang -O3 will produce code like this:

entry:
%cmp = icmp sgt i64 %k, 0
br i1 %cmp, label %for.body.preheader, label %if.end

for.body.preheader:
br label %for.body

for.body:
%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ]
%add = add nsw i64 %indvars.iv, %k
%arrayidx = getelementptr inbounds i32* %a, i64 %add
%0 = load i32* %arrayidx, align 4, !tbaa !1
...
%arrayidx7 = getelementptr inbounds i32* %a, i64 %indvars.iv
store i32 %add5, i32* %arrayidx7, align 4, !tbaa !1
...

My goal here is to detect that, within the loop, (&a[i] - &a[i + k]) is negative, as the explicit loop guard guarantees. On the path toward that goal, I've run into the following:

The SCEV for &a[i] looks like this: {%a,+,4}<nsw><%for.body>

but the SCEV for &a[i + k] looks like this: {((4 * %k) + %a),+,4}<%for.body>

and the problem is that this expression, even though it comes from a inbounds GEP, with nsw on all contributing instructions, does not have the <nsw> flag. I think that the problem is the (4 * %k) 'Index' expression, which lacks the no-wrap flags, and when createNodeForGEP uses it to construct the overall expression for the address, the result also lacks the nsw flag.

The reason that the (4 * %k) 'Index' expression lacks the flags is explained in createSCEV:

   // Don't apply this instruction's NSW or NUW flags to the new
   // expression. The instruction may be guarded by control flow that the
   // no-wrap behavior depends on. Non-control-equivalent instructions can be
   // mapped to the same SCEV expression, and it would be incorrect to transfer
   // NSW/NUW semantics to those operations.

and I appreciate this concern, but I suspect it is supposed to be possible to recover the flags on the resulting AddRec, and how is that supposed to happen? One issue may be that the control flow mentioned in the comment could be in the loop, although in this one-BB loop with no calls, I think that can be ruled out.

I'm sorry, I don't understand why the scale and addition on an inbounds GEP can be considered NSW. Can we assume malloc won't span the high/low address boundary? Then were left with folks doing mmap... Well, based on my understanding, createNodeForGEP looks wrong--I really hope I'm the one that's wrong.

As an aside, you could still prove no-self-wrap for this recurrence. If the base of the GEP is an NW recurrence (note that NSW or NUW implies NW), the GEP itself is inbounds, and the GEP simplifies to a recurrences, then I think you can infer that the recurrence for the GEP is also NW. I don’t think that adds useful information to this query though :frowning:

Along those lines, I’m a little baffled as to why you want an NSW recurrence. You already know the difference is 4*k and k > 0. Isn’t that enough? Is the problem that 4*k may overflow?

Maybe this sort of alias analysis needs to reason about array indices, not addresses.

-Andy

> Andy, et al.,
>
> If I start with C code like this:
>
> void foo(long k, int * restrict a, int * restrict b, int * restrict
> c) {
> if (k > 0) {
> for (int i = 0; i < 2047; i++) {
> a[i] = a[i + k] + b[i] * c[i];
> }
> }
> }
>
> Clang -O3 will produce code like this:
>
> entry:
> %cmp = icmp sgt i64 %k, 0
> br i1 %cmp, label %for.body.preheader, label %if.end
>
> for.body.preheader:
> br label %for.body
>
> for.body:
> %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0,
> %for.body.preheader ]
> %add = add nsw i64 %indvars.iv, %k
> %arrayidx = getelementptr inbounds i32* %a, i64 %add
> %0 = load i32* %arrayidx, align 4, !tbaa !1
> ...
> %arrayidx7 = getelementptr inbounds i32* %a, i64 %indvars.iv
> store i32 %add5, i32* %arrayidx7, align 4, !tbaa !1
> ...
>
> My goal here is to detect that, within the loop, (&a[i] - &a[i +
> k]) is negative, as the explicit loop guard guarantees. On the
> path toward that goal, I've run into the following:
>
> The SCEV for &a[i] looks like this: {%a,+,4}<nsw><%for.body>
>
> but the SCEV for &a[i + k] looks like this: {((4 * %k) +
> %a),+,4}<%for.body>
>
> and the problem is that this expression, even though it comes from
> a inbounds GEP, with nsw on all contributing instructions, does
> not have the <nsw> flag. I think that the problem is the (4 * %k)
> 'Index' expression, which lacks the no-wrap flags, and when
> createNodeForGEP uses it to construct the overall expression for
> the address, the result also lacks the nsw flag.
>
> The reason that the (4 * %k) 'Index' expression lacks the flags is
> explained in createSCEV:
>
> // Don't apply this instruction's NSW or NUW flags to the new
> // expression. The instruction may be guarded by control flow
> that the
> // no-wrap behavior depends on. Non-control-equivalent
> instructions can be
> // mapped to the same SCEV expression, and it would be incorrect
> to transfer
> // NSW/NUW semantics to those operations.
>
> and I appreciate this concern, but I suspect it is supposed to be
> possible to recover the flags on the resulting AddRec, and how is
> that supposed to happen? One issue may be that the control flow
> mentioned in the comment could be in the loop, although in this
> one-BB loop with no calls, I think that can be ruled out.

I'm sorry, I don't understand why the scale and addition on an
inbounds GEP can be considered NSW. Can we assume malloc won't span
the high/low address boundary?

I think that we're okay on this front. The language reference defines inbounds GEPs in terms of "infinitely precise signed arithmetic", so I think that wrapping and being inbounds are incompatible.

Then were left with folks doing
mmap... Well, based on my understanding, createNodeForGEP looks
wrong--I really hope I'm the one that's wrong.

I think that the code is proabably okay: it is not forcing the result to have the nsw flag (if it did, I probably would not have written my e-mail), but applying it only to the 'address computation' itself. On the other hand, maybe that violates SCEV's general philosophy of being control-flow independent (maybe the logic that prohibits us from applying an add's nsw flag to an SCEVAddExpr should prohibit this as well).

As an aside, you could still prove no-self-wrap for this recurrence.
If the base of the GEP is an NW recurrence (note that NSW or NUW
implies NW), the GEP itself is inbounds, and the GEP simplifies to a
recurrences, then I think you can infer that the recurrence for the
GEP is also NW.

Okay, so you mean that I could force set the NW flag on the resulting GEP in that case?

I don’t think that adds useful information to this
query though :frowning:

Along those lines, I’m a little baffled as to why you want an NSW
recurrence. You already know the difference is 4*k and k > 0. Isn’t
that enough? Is the problem that 4*k may overflow?

Ah, I forgot to explain, one problem is that I can't divide {((4 * %k) + %a),+,4}<%for.body> by 4 to get ride of the type size. I was trying to do this in order to make it easier for isLoopEntryGuardedByCond to match the condition, but even dividing by 4 does not work if the recurrence is not <nsw> (as far as I can tell).

Maybe this sort of alias analysis needs to reason about array
indices, not addresses.

Being able to divide by the type sizes would make this easier :wink:

Thanks again,
Hal

Asking for the SCEV of the GEP index instead of the base would seem to make life easier for SCEV and give you everything you need for dependence analysis.

-Andy

What if we stop trying to force nsw flags into SCEV and stop relying on s/zext elimination. I think we ultimately need to handle these cases if it means inserting loop guards.

Now our canonical SCEVs look like:

st a1 = gep(a,i) = A + 4 * ext{0,+,1}
ld a2 = gep(a,i+k) = A + 4 * ext{k,+,1}

Now say we have a routine for dependence distance from st to ld.

dep-distance(a1, a2)

This would gather any assumptions necessary to compute the distance. Recursing through the pair of SCEVs would perform the following steps:

Handle AddExpr:
We have identical LHS
Assume NSW: A + 4 * ext{0,+,1}
Assume NSW: A + 4 * ext{k,+,1}

Handle MulExpr:
We have identical Scale: 4
Assume NSW: 4 * ext{0,+,1}
Assume NSW: 4 * ext{k,+,1}

Ignore s/zext, we’re assuming infinite precision.

Handle AddRec:
We have identical Step
Assume NSW: {0,+,1}
Assume NSW: {k,+,1}

We now have a loop-invariant distance: k
Multiplied by a scale: 4

Walk through the assumptions, attempting to prove them on every path reaching the loop-back edge.

The existence of inbounds GEPs in the loop bodies easily proves the first four.

Now we’re left with this constraint based on the remaining assumptions from the recurrences:

min(0,k) < max(0,k) + 2047

(Relying on 2s complement over the recurrence’s bitwidth.)

k > 0 dominates the loop so we have:

0 < k + 2047

This constraint seems obvious from GEP inbounds-ness, but I’m not sure how to prove it in general when the bitwidth of the recurrence is narrower than the GEP. If we can’t prove it, then we need to add the guard to the loop.

For cases with more general loop bounds and nonunit strides, we also need to improve BECount computation.

-Andy