ScalarEvolution::createNodeForPHI

Hello to everybody,

I'm working on some improvements on trip count computation with ScalarEvolution
analysis.
Considering the following test

;----------------------------------------------------------------------------;
define void @foo(i32 %a, i32 %b, i32 %s) #0 {
entry:
  %cmp = icmp sgt i32 %s, 0
  %cmp15 = icmp sgt i32 %a, %b
  %or.cond = and i1 %cmp, %cmp15
  br i1 %or.cond, label %for.body, label %if.end

for.body:
  %i.06 = phi i32 [ %sub, %for.body ], [ %a, %entry ]
  tail call void @f(i32 %i.06) #2
  %sub = sub nsw i32 %i.06, %s
  %cmp1 = icmp sgt i32 %sub, %b
  br i1 %cmp1, label %for.body, label %if.end

if.end:
  ret void
}
;----------------------------------------------------------------------------;

I've noticed that the SCEV for %i.06 and %sub are the following:

%i.06 = phi i32 [ %sub, %for.body ], [ %a, %entry ]
--> {%a,+,(-1 * %s)}<%for.body>

%sub = sub nsw i32 %i.06, %s
--> {((-1 * %s) + %a),+,(-1 * %s)}<%for.body>

but the NSW flag that is present in the SUB instruction is not propagated in the
AddRec.

Looking in the source code I analyzed the construction of the SCEV for a PHINode
and during the analysis of the loop-invariant part of the increment the flags
NSW/NUW are set according to the Operator BEValueV
(ScalarEvoluton.cpp:3099-3113), but only Add and GEP operators are checked.

//-------------------------------------------------------------------------//
if (const AddOperator *OBO = dyn_cast<AddOperator>(BEValueV)) {
  if (OBO->hasNoUnsignedWrap())
    Flags = setFlags(Flags, SCEV::FlagNUW);
  if (OBO->hasNoSignedWrap())
    Flags = setFlags(Flags, SCEV::FlagNSW);
} else if (const GEPOperator *GEP =
             dyn_cast<GEPOperator>(BEValueV)) {
  // If the increment is an inbounds GEP, then we know the address
  // space cannot be wrapped around. We cannot make any guarantee
  // about signed or unsigned overflow because pointers are
  // unsigned but we may have a negative index from the base
  // pointer.
  if (GEP->isInBounds())
    Flags = setFlags(Flags, SCEV::FlagNW);
}
//-------------------------------------------------------------------------//

Is there any reason to not check also Sub operator in a similar way to the Add
operator?

//-------------------------------------------------------------------------//
if (const SubOperator *OBO = dyn_cast<SubOperator>(BEValueV)) {
  if (OBO->hasNoUnsignedWrap())
    Flags = setFlags(Flags, SCEV::FlagNUW);
  if (OBO->hasNoSignedWrap())
    Flags = setFlags(Flags, SCEV::FlagNSW);
}
//-------------------------------------------------------------------------//

Thanks in advance.

Best regards,

-Michele Scandale

I’m not sure how to make sense of an NUW flag coming from a sub.

NSW is just a misnomer for signed overflow. SCEV canonicalizes sub a,b to add a, (-b). Unfortunately, signed overflow is not the same thing for sub and add...

sub nsw %a, %b != add nsw %a, (-1 * %b)

sub nsw i32, -1, INT_MIN is true.

add nsw i32, -1, (-1 * INT_MIN) is false.

NSW flags just aren’t an effective way to tell SCEV about overflow. Ideally we could reason more generally about the loop’s undefined behavior. For example, I’d like to query the range of values that a variable can hold before the loop hits undefined behavior. I don’t know how to implement something like this yet. Ideas are welcome.

Reading this code again, even the addition case looks too aggressive to me.
See:
llvm.org/PR17452 - SCEV createNodeForPHI is too optimistic about NSW.

-Andy

> Hello to everybody,
>
> I'm working on some improvements on trip count computation with
> ScalarEvolution
> analysis.
> Considering the following test
>
> ;----------------------------------------------------------------------------;
> define void @foo(i32 %a, i32 %b, i32 %s) #0 {
> entry:
> %cmp = icmp sgt i32 %s, 0
> %cmp15 = icmp sgt i32 %a, %b
> %or.cond = and i1 %cmp, %cmp15
> br i1 %or.cond, label %for.body, label %if.end
>
> for.body:
> %i.06 = phi i32 [ %sub, %for.body ], [ %a, %entry ]
> tail call void @f(i32 %i.06) #2
> %sub = sub nsw i32 %i.06, %s
> %cmp1 = icmp sgt i32 %sub, %b
> br i1 %cmp1, label %for.body, label %if.end
>
> if.end:
> ret void
> }
> ;----------------------------------------------------------------------------;
>
> I've noticed that the SCEV for %i.06 and %sub are the following:
>
> %i.06 = phi i32 [ %sub, %for.body ], [ %a, %entry ]
> --> {%a,+,(-1 * %s)}<%for.body>
>
> %sub = sub nsw i32 %i.06, %s
> --> {((-1 * %s) + %a),+,(-1 * %s)}<%for.body>
>
> but the NSW flag that is present in the SUB instruction is not
> propagated in the
> AddRec.
>
> Looking in the source code I analyzed the construction of the SCEV
> for a PHINode
> and during the analysis of the loop-invariant part of the increment
> the flags
> NSW/NUW are set according to the Operator BEValueV
> (ScalarEvoluton.cpp:3099-3113), but only Add and GEP operators are
> checked.
>
> //-------------------------------------------------------------------------//
> if (const AddOperator *OBO = dyn_cast<AddOperator>(BEValueV)) {
> if (OBO->hasNoUnsignedWrap())
> Flags = setFlags(Flags, SCEV::FlagNUW);
> if (OBO->hasNoSignedWrap())
> Flags = setFlags(Flags, SCEV::FlagNSW);
> } else if (const GEPOperator *GEP =
> dyn_cast<GEPOperator>(BEValueV)) {
> // If the increment is an inbounds GEP, then we know the address
> // space cannot be wrapped around. We cannot make any guarantee
> // about signed or unsigned overflow because pointers are
> // unsigned but we may have a negative index from the base
> // pointer.
> if (GEP->isInBounds())
> Flags = setFlags(Flags, SCEV::FlagNW);
> }
> //-------------------------------------------------------------------------//
>
> Is there any reason to not check also Sub operator in a similar way
> to the Add
> operator?
>
> //-------------------------------------------------------------------------//
> if (const SubOperator *OBO = dyn_cast<SubOperator>(BEValueV)) {
> if (OBO->hasNoUnsignedWrap())
> Flags = setFlags(Flags, SCEV::FlagNUW);
> if (OBO->hasNoSignedWrap())
> Flags = setFlags(Flags, SCEV::FlagNSW);
> }
> //-------------------------------------------------------------------------//

I’m not sure how to make sense of an NUW flag coming from a sub.

I thought that sub nuw a, b just meant that we were allowed to assume that a >= b. No?

-Hal

That’s a reasonable interpretation. The problem is that we canonicalize to add a, -b. This is a problem in Reassociate and SCEV.

-Andy

If the behavior is different between 'add nsw' and 'sub nsw' then the SCEV
canonicalization of 'sub' leads to incorrect results.

In your example both result should be the same but the 'add' produces a poison
value. Is this what we really want?

The case of an expression with mixed operator ('add' with and without nsw) flag
would require an explicit representation of the two operator making more complex
reassociation, simplification and reordering of the expression operands.

Your proposed workaround of checking that the BEValue is an operation between
the PHINode Value and a loop invariant would be fine but it's still a workaround.

-Michele

I’m not sure how to make sense of an NUW flag coming from a sub.

NSW is just a misnomer for signed overflow. SCEV canonicalizes sub a,b to add a, (-b). Unfortunately, signed overflow is not the same thing for sub and add...

sub nsw %a, %b != add nsw %a, (-1 * %b)

sub nsw i32, -1, INT_MIN is true.

add nsw i32, -1, (-1 * INT_MIN) is false.

NSW flags just aren’t an effective way to tell SCEV about overflow. Ideally we could reason more generally about the loop’s undefined behavior. For example, I’d like to query the range of values that a variable can hold before the loop hits undefined behavior. I don’t know how to implement something like this yet. Ideas are welcome.

Reading this code again, even the addition case looks too aggressive to me.
See:
llvm.org/PR17452 - SCEV createNodeForPHI is too optimistic about NSW.

If the behavior is different between 'add nsw' and 'sub nsw' then the SCEV
canonicalization of 'sub' leads to incorrect results.

Creating a SCEV expression for a sub instruction drops the flags. It’s safe, but the flags are useless.

In your example both result should be the same but the 'add' produces a poison
value. Is this what we really want?

By true/false in the example above I meant defined/undefined. Since we can't introduce undefined behavior when none previously existed, we drop the nsw flags during canonicalization. If we could prove that the second operand is > INT_MIN, then we could preserve the flags. But your example has variable stride.

The case of an expression with mixed operator ('add' with and without nsw) flag
would require an explicit representation of the two operator making more complex
reassociation, simplification and reordering of the expression operands.

Yes, in general an attempt to preserve NSW flags would defeat the simple utility of SCEV. That’s why we drop the flags all over the place.

Your proposed workaround of checking that the BEValue is an operation between
the PHINode Value and a loop invariant would be fine but it's still a workaround.

This fixes a correctness problem I noticed in the same bit of code (PR17452). My suggestion was to be conservative in preserving NSW. It’s not about missing optimizations.

I don’t have a concrete suggestion for harder problem of missing optimization. I can say that we need something better than embedding flags within SCEV expressions. PR16358 is a recent example of the general problem (there have been other unsolved cases that I can’t find now). Arnold made the point here that SCEV needs to infer the legal range of one expression from another.

My very abstract suggestion for future improvements are:
- At the SCEV level, logic to deduce that some recurrences cannot wrap given a set of loop facts derived from the IR and represented independent from the SCEV expressions themselves.
- At the IR level, rely on some sort of trap-on-overflow intrinsic independent from the arithmetic operations that can be carried through canonicalization, reassociation etc. This could be done today with existing overflow intrinsics, but modeling the trap would require a branch.

I suggest filing a bug for your case and continuing discussion there. It may be related to PR12377: Loop trip count not calculated.

-Andy

I’m not sure how to make sense of an NUW flag coming from a sub.

NSW is just a misnomer for signed overflow. SCEV canonicalizes sub a,b to add a, (-b). Unfortunately, signed overflow is not the same thing for sub and add...

sub nsw %a, %b != add nsw %a, (-1 * %b)

sub nsw i32, -1, INT_MIN is true.

add nsw i32, -1, (-1 * INT_MIN) is false.

NSW flags just aren’t an effective way to tell SCEV about overflow. Ideally we could reason more generally about the loop’s undefined behavior. For example, I’d like to query the range of values that a variable can hold before the loop hits undefined behavior. I don’t know how to implement something like this yet. Ideas are welcome.

Reading this code again, even the addition case looks too aggressive to me.
See:
llvm.org/PR17452 - SCEV createNodeForPHI is too optimistic about NSW.

If the behavior is different between 'add nsw' and 'sub nsw' then the SCEV
canonicalization of 'sub' leads to incorrect results.

In your example both result should be the same but the 'add' produces a poison
value. Is this what we really want?

Reflecting on this I think that the example is wrong: the point is that -1 *
INT_MIN is not representable over 32bit. The overflow rule is the same for 'add'
and 'sub', however it's not true that you can replace a 'sub %a, %b' with 'add
%a, (-1 * %b)'. If the SCEVExpander can avoid these case it should be fine, but
it may be useful to have also a direct support for 'sub'.

The case of an expression with mixed operator ('add' with and without nsw) flag
would require an explicit representation of the two operator making more complex
reassociation, simplification and reordering of the expression operands.

Your proposed workaround of checking that the BEValue is an operation between
the PHINode Value and a loop invariant would be fine but it's still a workaround.

If it's legal to reassociate a mix of 'add' and 'add nsw' then the poison value
propagation would make the NSW inference on the whole expression correct. On the
contrary to ensure semantic correctness we should keep distinct operator and
allow operator promotion in those cases where we are able to prove that the
wraparound cannot happen. I think that this is currently not possible because
SCEV are not uniquely mapped to a given IR Value...

What do you think about?

Thanks.

-Michele

I’m not sure how to make sense of an NUW flag coming from a sub.

NSW is just a misnomer for signed overflow. SCEV canonicalizes sub a,b to add a, (-b). Unfortunately, signed overflow is not the same thing for sub and add…

sub nsw %a, %b != add nsw %a, (-1 * %b)

sub nsw i32, -1, INT_MIN is true.

add nsw i32, -1, (-1 * INT_MIN) is false.

NSW flags just aren’t an effective way to tell SCEV about overflow. Ideally we could reason more generally about the loop’s undefined behavior. For example, I’d like to query the range of values that a variable can hold before the loop hits undefined behavior. I don’t know how to implement something like this yet. Ideas are welcome.

Reading this code again, even the addition case looks too aggressive to me.
See:
llvm.org/PR17452 - SCEV createNodeForPHI is too optimistic about NSW.

If the behavior is different between ‘add nsw’ and ‘sub nsw’ then the SCEV
canonicalization of ‘sub’ leads to incorrect results.

In your example both result should be the same but the ‘add’ produces a poison
value. Is this what we really want?

Reflecting on this I think that the example is wrong: the point is that -1 *
INT_MIN is not representable over 32bit. The overflow rule is the same for ‘add’
and ‘sub’, however it’s not true that you can replace a ‘sub %a, %b’ with ‘add
%a, (-1 * %b)’. If the SCEVExpander can avoid these case it should be fine, but
it may be useful to have also a direct support for 'sub’.

The difference between sub/add overflow is that the sub internally negates its operand using 2’s-complement in the same precision as the result. Overflow status reflects the final result relative to the operands, independent of whether the negation internally overflowed.

SCEV is also well-defined using 2’s-complement. Everything is representable. So it’s fine to canonicalize sub to negate->add if we drop NSW flags.

I know it’s crazy that this defeats the optimizer. If I knew how to preserve correctness without worrying about this I wouldn’t have mentioned it.

The case of an expression with mixed operator (‘add’ with and without nsw) flag
would require an explicit representation of the two operator making more complex
reassociation, simplification and reordering of the expression operands.

Your proposed workaround of checking that the BEValue is an operation between
the PHINode Value and a loop invariant would be fine but it’s still a workaround.

If it’s legal to reassociate a mix of ‘add’ and ‘add nsw’ then the poison value
propagation would make the NSW inference on the whole expression correct. On the
contrary to ensure semantic correctness we should keep distinct operator and
allow operator promotion in those cases where we are able to prove that the
wraparound cannot happen. I think that this is currently not possible because
SCEV are not uniquely mapped to a given IR Value…

What do you think about?

Now we’re switching back to the separate potential bug of mixing nsw and non-nsw. I don’t think this is important case to optimize, so I suggested a conservative approach.

In theory you could have:
add i, 2**31-2
add nsw i, 1

And iterate for a while before you see any poison values. I don’t think this happens in practice which is why I didn’t checkin an immediate fix.

-Andy

I’m not sure how to make sense of an NUW flag coming from a sub.

NSW is just a misnomer for signed overflow. SCEV canonicalizes sub a,b to add a, (-b). Unfortunately, signed overflow is not the same thing for sub and add...

sub nsw %a, %b != add nsw %a, (-1 * %b)

sub nsw i32, -1, INT_MIN is true.

add nsw i32, -1, (-1 * INT_MIN) is false.

NSW flags just aren’t an effective way to tell SCEV about overflow. Ideally we could reason more generally about the loop’s undefined behavior. For example, I’d like to query the range of values that a variable can hold before the loop hits undefined behavior. I don’t know how to implement something like this yet. Ideas are welcome.

Reading this code again, even the addition case looks too aggressive to me.
See:
llvm.org/PR17452 - SCEV createNodeForPHI is too optimistic about NSW.

If the behavior is different between 'add nsw' and 'sub nsw' then the SCEV
canonicalization of 'sub' leads to incorrect results.

Creating a SCEV expression for a sub instruction drops the flags. It’s safe, but the flags are useless.

In your example both result should be the same but the 'add' produces a poison
value. Is this what we really want?

By true/false in the example above I meant defined/undefined. Since we can't introduce undefined behavior when none previously existed, we drop the nsw flags during canonicalization. If we could prove that the second operand is > INT_MIN, then we could preserve the flags. But your example has variable stride.

The case of an expression with mixed operator ('add' with and without nsw) flag
would require an explicit representation of the two operator making more complex
reassociation, simplification and reordering of the expression operands.

Yes, in general an attempt to preserve NSW flags would defeat the simple utility of SCEV. That’s why we drop the flags all over the place.

Your proposed workaround of checking that the BEValue is an operation between
the PHINode Value and a loop invariant would be fine but it's still a workaround.

This fixes a correctness problem I noticed in the same bit of code (PR17452). My suggestion was to be conservative in preserving NSW. It’s not about missing optimizations.

I don’t have a concrete suggestion for harder problem of missing optimization. I can say that we need something better than embedding flags within SCEV expressions. PR16358 is a recent example of the general problem (there have been other unsolved cases that I can’t find now). Arnold made the point here that SCEV needs to infer the legal range of one expression from another.

My very abstract suggestion for future improvements are:
- At the SCEV level, logic to deduce that some recurrences cannot wrap given a set of loop facts derived from the IR and represented independent from the SCEV expressions themselves.

I agree with this. I noticed that value ranges are computed independently from
the "context" of the instruction related to the SCEV: for loop analysis we may
exploit information from facts dominating the loop execution. Maybe it would be
easier to map to a given IR Value a set of SCEV depending on the DomTree node we
are considering (given a variable %x, an if-then-else on condition "%x slt 0",
in one branch the range for %x would be [1, INT_MAX] in the other [INT_MIN, 0]).

- At the IR level, rely on some sort of trap-on-overflow intrinsic independent from the arithmetic operations that can be carried through canonicalization, reassociation etc. This could be done today with existing overflow intrinsics, but modeling the trap would require a branch.

Why this? Do we need to handle also trapping arithmetic?

I suggest filing a bug for your case and continuing discussion there. It may be related to PR12377: Loop trip count not calculated.

I've done a little rewrite of functions to compute trip count for LT and GT
comparisons. However that case is not handled because of the missing NSW flag in
the SCEV. So probably it would be better to proposed a patch with my
modification before.

-Michele

The trap can be viewed as an invariant. For C, the traps can disappear under the undefined behavior rule and they could be readonly to the optimizer. I imagine other LLVM clients would want to materialize the traps. That doesn’t matter to me now. It’s how the optimizer derives information from the IR and preserves it that matters.

SCEV could be improved without doing this. It does potentially provide a better way for SCEV to discover invariants, solves the problem of earlier passes, like Reassociate and GVN, dropping flags, and allows code to be rewritten based on SCEV without losing the invariants.

-Andy

My very abstract suggestion for future improvements are:
- At the SCEV level, logic to deduce that some recurrences cannot
wrap given a set of loop facts derived from the IR and represented
independent from the SCEV expressions themselves.

I agree with this. I noticed that value ranges are computed
independently from
the "context" of the instruction related to the SCEV: for loop
analysis we may
exploit information from facts dominating the loop execution. Maybe
it would be
easier to map to a given IR Value a set of SCEV depending on the
DomTree node we
are considering (given a variable %x, an if-then-else on condition
"%x slt 0",
in one branch the range for %x would be [1, INT_MAX] in the other
[INT_MIN, 0]).

Random question: is this relevant: https://code.google.com/p/wrapped-intervals/ ?

-Hal

My very abstract suggestion for future improvements are:
- At the SCEV level, logic to deduce that some recurrences cannot
wrap given a set of loop facts derived from the IR and represented
independent from the SCEV expressions themselves.

I agree with this. I noticed that value ranges are computed
independently from
the "context" of the instruction related to the SCEV: for loop
analysis we may
exploit information from facts dominating the loop execution. Maybe
it would be
easier to map to a given IR Value a set of SCEV depending on the
DomTree node we
are considering (given a variable %x, an if-then-else on condition
"%x slt 0",
in one branch the range for %x would be [1, INT_MAX] in the other
[INT_MIN, 0]).

Random question: is this relevant: https://code.google.com/p/wrapped-intervals/ ?

-Hal

This is interesting, I was thinking to something more lightweight focused only
on loop control variables, in order to exploits constraints that with the
current infrastructure are not used. BTW, I don't like too much the idea to move
the code into SSI for the analysis.

-Michele