Cast to SCEVAddRecExpr

Hi,

I’m trying to cast one of the SCEV node to “SCEVAddRecExpr”.

Every time cast return NULL, and I’m unable to do this.

SCEV Node:
((4 * (sext i32 {2,+,2}<%for.body4> to i64)) + %var)

Casting:
const SCEVAddRecExpr *AR = dyn_cast(SCEVNode);

‘var’ is of type float pointer (float*).

Without ‘sext’ it works, but I’m wondering why it not working in above case.

I’m not sure, is such casting allowed ?

Regards,
Ashutosh

Nema, Ashutosh wrote:

Hi,
I’m trying to cast one of the SCEV node to “SCEVAddRecExpr”.
Every time cast return NULL, and I’m unable to do this.
SCEV Node:
((4 * (sext i32 {2,+,2}<%for.body4> to i64))<nsw> + %var)<nsw>
Casting:
const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(SCEVNode);
‘var’ is of type float pointer (float*).
Without ‘sext’ it works, but I’m wondering why it not working in above case.
I’m not sure, is such casting allowed ?

It looks like your node is a SCEVAddExpr whose LHS is "(4 * (sext i32 {2,+,2}<%for.body4> to i64))<nsw>" and RHS is "%var", instead of a SCEVAddRecExpr. Perhaps if the sext weren't there SCEV might simplify the whole thing to a single SCEVAddRecExpr.

Can you cast your node to a SCEVAddExpr, then dyn_cast<SCEVAddRecExpr>(AddExpr->getOperand(0))?

Note that SCEV does not handle floats as anything but completely opaque values.

Nick

Hi Nick,

Thanks for looking into it.

I have tried that as well but it didn't worked.

"AddExpr->getOperand(0))" node is:
" (4 * (sext i32 {2,+,2}<%for.body4> to i64))<nsw>"

When I cast this to "SCEVAddRecExpr" it returns NULL.

Regards,
Ashutosh

Nema, Ashutosh wrote:

Hi Nick,

Thanks for looking into it.

I have tried that as well but it didn't worked.

"AddExpr->getOperand(0))" node is:
" (4 * (sext i32 {2,+,2}<%for.body4> to i64))<nsw>"

When I cast this to "SCEVAddRecExpr" it returns NULL.

Oh. Yeah, that's because it's a multiply with a 4 on the left. On the right is an sext, and if you grab its operand, then *that* is a SCEVAddRecExpr.

Yes, I can get "SCEVAddRecExpr" from operands of "(sext i32 {2,+,2}<%for.body4> to i64)".

So whenever SCEV cast to "SCEVAddRecExpr" fails, we have drill down for such patterns ?

Is that the right way ?

Regards,
Ashutosh

Nema, Ashutosh wrote:

Yes, I can get "SCEVAddRecExpr" from operands of "(sext i32 {2,+,2}<%for.body4> to i64)".

So whenever SCEV cast to "SCEVAddRecExpr" fails, we have drill down for such patterns ?

I don't know, what are you planning to do with it? Even if you do drill down and find the addrec inside, is that useful to you? SCEV will already try to hoist out the addrec to the top level. If it isn't there, then it probably can't be rewritten so that it could be. If it can be, that's a missed optimization opportunity and should be taught to SCEV.

What's your goal? Look at every addrec? If that's it, I would say that yes you need to drill down, but that you should only query SCEV about PHINode instructions.

Hi Nick,

Consider below case:

for (j=1; j < itr; j++) {
     - - - -
     for (i=1; i < itr; i++) {
     {
       temp= var[1 << i];
        - - - - -
     }
}

In the above example, we are unable to get "SCEVAddRecExpr" for "var[1 << i]"

Its "SCEVAddRecExpr" is computable in *Outer Loop*

I expected SCEV will hoist AddRec to top level, but it's not doing that.

Instead if I change "var[1 << i]" to "var[i * 2]" then I'm getting "SCEVAddRecExpr".

What is your opinion on this, to me this looks like a missed opportunity in SCEV.

Regards,
Ashutosh

Sorry typo in test case, Please ignore previous mail.

Consider below case:

for (j=1; j < itr; j++) {
     - - - -
     for (i=1; i < itr; i++) {
     {
       temp= var[i << 1];
        - - - - -
     }
}

In the above example, we are unable to get "SCEVAddRecExpr" for "var[i << 1]"

Its "SCEVAddRecExpr" is computable in *Outer Loop*

I expected SCEV will hoist AddRec to top level, but it's not doing that.

Instead if I change "var[i << 1]" to "var[i * 2]" then I'm getting "SCEVAddRecExpr".

What is your opinion on this, to me this looks like a missed opportunity in SCEV.

Regards,
Ashutosh

I don't think var[1<<i] is not an add recurrence -- it is a mul
recurrence {var,*,2}, and as of right now there is no direct way to
express such a thing in SCEV.

Alternately, we could have a SCEVLeftShiftExpr and express 1<<i as a
SCEVLeftShiftExpr of an SCEVAddRecExpr.

-- Sanjoy

Sorry typo in test case, Please ignore previous mail.

Consider below case:

for (j=1; j < itr; j++) {
     - - - -
     for (i=1; i < itr; i++) {
     {
       temp= var[i << 1];
        - - - - -
     }
}

In the above example, we are unable to get "SCEVAddRecExpr" for "var[i << 1]"

To be pedantic, "var[i<<1]" is not an add recurrence, but "&var[i <<
1]" is an add recurrence. I'll assume that's that you meant.

If I run the following example through opt -analyze -scalar-evolution:

define void @x(i1* %c, i64* %ptr) {
entry:
  br label %loop

loop:
  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
  %i.inc = add i64 %i, 1
  %d = shl i64 %i.inc, 1
  %gep = getelementptr i64, i64* %ptr, i64 %d
  %cc = load volatile i1, i1* %c
  br i1 %cc, label %exit, label %loop

exit:
  ret void
}

I get the SCEV --> {(16 + %ptr),+,16}<%loop> for %gep. So SCEV /can/
interpret i<<1 as i*2 under some specific circumstances.

Going back to your original question:

Instead if I change "var[i << 1]" to "var[i * 2]" then I'm getting "SCEVAddRecExpr".

I think that is because in C, multiplication is nsw but left shift is
not and so "i << 1" can legitimately sign-overflow but i * 2 cannot
sign-overflow. I'd venture a guess that this lets LLVM transform a
sign-extend of an add-rec to an add-rec sign-extends.

Caveat: I have not looked at the C specification to determine that
left shifts are allowed to sign-overflow. I concluded that left
shifts are not nsw by looking at the llvm IR emitted by clang (the
left shift generated from "i << 1" is not marked nsw).

-- Sanjoy

Thanks Sanjoy.

To be pedantic, "var[i<<1]" is not an add recurrence, but "&var[i <<
1]" is an add recurrence. I'll assume that's that you meant.

Yes, I meant the same.

I think that is because in C, multiplication is nsw but left shift is
not and so "i << 1" can legitimately sign-overflow but i * 2 cannot
sign-overflow. I'd venture a guess that this lets LLVM transform a
sign-extend of an add-rec to an add-rec sign-extends.

I agree here, we want LLVM to transform sign-extend of add-rec to
add-rec of sign extend in possible cases. Currently I don’t see LLVM is doing this.
i.e.:
(sext i32 addrec{2,+,2}<%for.body4> to i64)
Will convert it to:
addrec{(sext i32 '2' to i64), + , (sext i32 '2' to i64)} <%for.body4>

If this looks OK, I'll make changes and come up with a patch.

With sign-extend (SCEVSignExtendExpr) similarly we have to take care and
handle other casts as well (i.e. SCEVTruncateExpr, SCEVZeroExtendExpr).

Regards,
Ashutosh

That transform is not valid because these two expressions are not
equivalent if the add recurrence can sign-overflow. For instance, (i8
for simplicity) if you consider {sext i8 2 to i16,+, sext i8 2 to i16}
and (sext i8 {2,+,2} to i16) then in the 63rd iteration the first
add-recs value is (i16 2) * 63 + (i16 2) = 128 while the second
add-rec's value in the same iteration is -128. However, if LLVM can
prove the second expression cannot sign-overflow (i.e. the loop will
run for < 63 iterations) then transforming (sext i8 {2,+,2} to i16) to
{sext i8 2 to i16,+, sext i8 2 to i16} is valid. Add recurrences that
have been proven to not overflow are marked as <nsw>, and are printed
like {S,+,X}<nsw>.

When compiling from C, left shifts operations are not marked nsw
(because in C they are allowed to sign overflow) but the equivalent
multiplies are marked nsw (because sign-overflow of a multiply is
undefined behavior). This lets LLVM prove {2,+,2} is <nsw> when it
comes from a multiply and lets it commute a sign extend into the add
recurrence. This explains the difference between var[i << 1] and
var[i * 2].

-- Sanjoy

That transform is not valid because these two expressions are not
equivalent if the add recurrence can sign-overflow. For instance, (i8
for simplicity) if you consider {sext i8 2 to i16,+, sext i8 2 to i16}
and (sext i8 {2,+,2} to i16) then in the 63rd iteration the first
add-recs value is (i16 2) * 63 + (i16 2) = 128 while the second
add-rec's value in the same iteration is -128.

Thanks for this detailed explanation Sanjoy.

However, if LLVM can
prove the second expression cannot sign-overflow (i.e. the loop will
run for < 63 iterations) then transforming (sext i8 {2,+,2} to i16) to
{sext i8 2 to i16,+, sext i8 2 to i16} is valid.

Yes, but it's difficult to prove in all types of loop.

Add recurrences that
have been proven to not overflow are marked as <nsw>, and are printed
like {S,+,X}<nsw>.

Where compiler is sure about *will not overflow* (i.e. var[i * 2] ) will be marked as 'nsw'.

Cases where we have power of two will be transformed to shift reduction by Instruction Combiner.
i.e. var[i * 2] will be transformed to var[i << 1] but 'nsw' property was not carried forward.

I have seen your mail thread on this.

If somehow we can preserve 'nsw' property then the
transformation I suggested looks valid ?

i.e.:
(sext i32 Addrec{2,+,2}<nuw><%for.body4> to i64)
Transformed To:
Addrec{(sext i32 '2' to i64), + , (sext i32 '2' to i64)} <nsw><%for.body4>

This will open up optimization opportunities.

Regards,
Ashutosh

Cases where we have power of two will be transformed to shift reduction by Instruction Combiner.
i.e. var[i * 2] will be transformed to var[i << 1] but 'nsw' property was not carried forward.

That looks like an instcombine bug -- for anything except i2 (I assume
you're not indexing into an array using an i2) "mul nsw X, 2" should
be equivalent to "lshr nsw X, 1", irrespective on what is decided on
that thread.

(sext i32 Addrec{2,+,2}<nuw><%for.body4> to i64)

Do you mean (sext i32 Addrec{2,+,2}<nsw><%for.body4> to i64)? If so, then yes.

Transformed To:
Addrec{(sext i32 '2' to i64), + , (sext i32 '2' to i64)} <nsw><%for.body4>

-- Sanjoy