ScalarEvolution "add nsw" question

Hello,

I was wondering under which circumstances ScalarEvolution will propagate
the no wrap flags from binary operators. In particular I looked at
non-loop carried code, e.g., as in the following function:

  int add(int x, int y) { return x + y; }

for which clang uses an "add nsw" instruction but ScalarEvolution does
not propagate this information. The -analyze output looks like this:

  Classifying expressions for: @add
    %add = add nsw i32 %x, %y
    --> (%x + %y) U: full-set S: full-set

The piece of code that causes this (for me) unexpected result is located
in the ScalarEvolution::getNoWrapFlagsFromUB(...) function and looks
like this:

  if (InnermostContainingLoop == nullptr ||
      InnermostContainingLoop->getHeader() != BinOp->getParent())
    return SCEV::FlagAnyWrap;

If the "InnermostContainingLoop" is a nullptr, thus if the binary
operator is not contained in a loop, all overflow annotations are simply
discarded.

I do not know if I am missing something or if simply nobody cared so
far. In any case, I would appreciate any documentation on the current
implementation as well as pointer to were it could be improved.

Cheers,
  Johannes

P.S.
  I have another problem with the way we currently discard the "nsw"
  information for accesses ("gep inbounds" apparently only implies nuw).
  I will probably start another thread for that one [after I read the
  GEP handling in more detail].

[+CC Bjarke who wrote getNoWrapFlagsFromUB + related bits]

Hi Johannes,

One fundamental reason why we cannot reason about NoWrap flags in SCEV
for arithmetic outside of loops is to avoid issues like this:

if (cond) {
   val[x +nsw y] = 42;
} else {
   val[x + y] = 42;
}

Now *both* the "x +nsw y" and "x + y" will get mapped to the same SCEV
expression (this is by design [0]), and therefore we cannot mark *the*
SCEV expression for "x + y" with NSW, otherwise we may incorrectly
simplify the "val[x + y] = 42" branch.

(Btw, I'm sure you're aware of this, but "x +nsw y" by itself does not
guarantee no-wrap, it only does so if "x +nsw y" is used in a
side-effecting way.)

getNoWrapFlagsFromUB() avoids the problem above by adding NSW
(resp. NUW) to operations on add recurrences (i.e. an add-rec + 2,
say) that are guaranteed to execute at least once per loop iteration.
Thus we are guaranteed that the *arithmetic operation* (not //that//
specific instruction, but the arithmetic itself) does not overflow on
any iteration and thus can be safely marked as non-wrapping, since if
it didn't the program is guaranteed to be UB.

[0]: IOW, we don't "key" on the no-wrap tag, we just key on the
arithmetic bits of the operation and mutate SCEV expression in-place
when we discover NUW / NSW.

-- Sanjoy

Hey Sanjoy,

Thanks for the quick repsonse.

[+CC Bjarke who wrote getNoWrapFlagsFromUB + related bits]

Also thanks.

One fundamental reason why we cannot reason about NoWrap flags in SCEV
for arithmetic outside of loops is to avoid issues like this:

if (cond) {
  val[x +nsw y] = 42;
} else {
  val[x + y] = 42;
}

Now *both* the "x +nsw y" and "x + y" will get mapped to the same SCEV
expression (this is by design [0]), and therefore we cannot mark *the*
SCEV expression for "x + y" with NSW, otherwise we may incorrectly
simplify the "val[x + y] = 42" branch.

OK. I might have to look at the IR then since I am interested in the
particular instructions that make up the computation not the abstract
"arithmetic".

(Btw, I'm sure you're aware of this, but "x +nsw y" by itself does not
guarantee no-wrap, it only does so if "x +nsw y" is used in a
side-effecting way.)

I was not (in particular) but that makes sense. For my work the poison
value will (almost) always cause a side-effect to have undefined
behaviour so that should not be a problem.

getNoWrapFlagsFromUB() avoids the problem above by adding NSW
(resp. NUW) to operations on add recurrences (i.e. an add-rec + 2,
say) that are guaranteed to execute at least once per loop iteration.
Thus we are guaranteed that the *arithmetic operation* (not //that//
specific instruction, but the arithmetic itself) does not overflow on
any iteration and thus can be safely marked as non-wrapping, since if
it didn't the program is guaranteed to be UB.

Mh, I have to think about this a bit...

Is there any plan to use e.g., post-dominance information to
propagate wrapping flags? If x +nsw y post-dominates the entry block
[and is used in some side-effect instruction] the SCEV could be marked
as non-wrapping, couldn't it? How do you determine if the operation will
poison a side-effect instruction anyway?

[0]: IOW, we don't "key" on the no-wrap tag, we just key on the
arithmetic bits of the operation and mutate SCEV expression in-place
when we discover NUW / NSW.

-- Sanjoy

Cheers,
  Johannes

Johannes Doerfert wrote:
> Is there any plan to use e.g., post-dominance information to
> propagate wrapping flags?

None that I'm aware of.

> If x +nsw y post-dominates the entry block
> [and is used in some side-effect instruction] the SCEV could be marked
> as non-wrapping, couldn't it?

Yes, but we have to be careful to account for exceptions, infinitely
looping functions, exit(0) etc. I.e. in

void foo() {
   use(x + y);
   exit(0);
   use(x +nsw y);
}

the nsw in the later x+y cannot be transferred to the former x+y
(assume use is a side effecting use).

> How do you determine if the operation will
> poison a side-effect instruction anyway?

SCEV uses llvm::isKnownNotFullPoison for this.

-- Sanjoy

Hi Johannes,

Sanjoy has given you great information already.

Hi Bjarke,

Thanks for your response. I agree that Sanjoy's email already answered
my question. To be honest, I am not 100% sure about the loop code and
the consequences but that doesn't matter to much at the moment. What
matters is that I have made progress on explaining the mismatch between
the ScalarEvolution results and what I would have expected, or better,
liked. At the moment I think that the context in-sensitivity of
ScalarEvolution is what is preventing me from the results I would like
to see. Therefor, I have to look at the IR during Polly's context
sensitive analysis/modeling step. As our "context" includes information
about possibly "non-returning function calls", infinite loops, etc. I
think that should allow to get precise but also sound results.

Thank you both again,
  Johannes

Replace context with flow in the last mail... sry.