Hello,
I was wondering under which circumstances ScalarEvolution will propagate
the no wrap flags from binary operators. In particular I looked at
non-loop carried code, e.g., as in the following function:
int add(int x, int y) { return x + y; }
for which clang uses an "add nsw" instruction but ScalarEvolution does
not propagate this information. The -analyze output looks like this:
Classifying expressions for: @add
%add = add nsw i32 %x, %y
--> (%x + %y) U: full-set S: full-set
The piece of code that causes this (for me) unexpected result is located
in the ScalarEvolution::getNoWrapFlagsFromUB(...) function and looks
like this:
if (InnermostContainingLoop == nullptr ||
InnermostContainingLoop->getHeader() != BinOp->getParent())
return SCEV::FlagAnyWrap;
If the "InnermostContainingLoop" is a nullptr, thus if the binary
operator is not contained in a loop, all overflow annotations are simply
discarded.
I do not know if I am missing something or if simply nobody cared so
far. In any case, I would appreciate any documentation on the current
implementation as well as pointer to were it could be improved.
Cheers,
Johannes
P.S.
I have another problem with the way we currently discard the "nsw"
information for accesses ("gep inbounds" apparently only implies nuw).
I will probably start another thread for that one [after I read the
GEP handling in more detail].
[+CC Bjarke who wrote getNoWrapFlagsFromUB + related bits]
Hi Johannes,
One fundamental reason why we cannot reason about NoWrap flags in SCEV
for arithmetic outside of loops is to avoid issues like this:
if (cond) {
val[x +nsw y] = 42;
} else {
val[x + y] = 42;
}
Now *both* the "x +nsw y" and "x + y" will get mapped to the same SCEV
expression (this is by design [0]), and therefore we cannot mark *the*
SCEV expression for "x + y" with NSW, otherwise we may incorrectly
simplify the "val[x + y] = 42" branch.
(Btw, I'm sure you're aware of this, but "x +nsw y" by itself does not
guarantee no-wrap, it only does so if "x +nsw y" is used in a
side-effecting way.)
getNoWrapFlagsFromUB() avoids the problem above by adding NSW
(resp. NUW) to operations on add recurrences (i.e. an add-rec + 2,
say) that are guaranteed to execute at least once per loop iteration.
Thus we are guaranteed that the *arithmetic operation* (not //that//
specific instruction, but the arithmetic itself) does not overflow on
any iteration and thus can be safely marked as non-wrapping, since if
it didn't the program is guaranteed to be UB.
[0]: IOW, we don't "key" on the no-wrap tag, we just key on the
arithmetic bits of the operation and mutate SCEV expression in-place
when we discover NUW / NSW.
-- Sanjoy
Hey Sanjoy,
Thanks for the quick repsonse.
[+CC Bjarke who wrote getNoWrapFlagsFromUB + related bits]
Also thanks.
One fundamental reason why we cannot reason about NoWrap flags in SCEV
for arithmetic outside of loops is to avoid issues like this:
if (cond) {
val[x +nsw y] = 42;
} else {
val[x + y] = 42;
}
Now *both* the "x +nsw y" and "x + y" will get mapped to the same SCEV
expression (this is by design [0]), and therefore we cannot mark *the*
SCEV expression for "x + y" with NSW, otherwise we may incorrectly
simplify the "val[x + y] = 42" branch.
OK. I might have to look at the IR then since I am interested in the
particular instructions that make up the computation not the abstract
"arithmetic".
(Btw, I'm sure you're aware of this, but "x +nsw y" by itself does not
guarantee no-wrap, it only does so if "x +nsw y" is used in a
side-effecting way.)
I was not (in particular) but that makes sense. For my work the poison
value will (almost) always cause a side-effect to have undefined
behaviour so that should not be a problem.
getNoWrapFlagsFromUB() avoids the problem above by adding NSW
(resp. NUW) to operations on add recurrences (i.e. an add-rec + 2,
say) that are guaranteed to execute at least once per loop iteration.
Thus we are guaranteed that the *arithmetic operation* (not //that//
specific instruction, but the arithmetic itself) does not overflow on
any iteration and thus can be safely marked as non-wrapping, since if
it didn't the program is guaranteed to be UB.
Mh, I have to think about this a bit...
Is there any plan to use e.g., post-dominance information to
propagate wrapping flags? If x +nsw y post-dominates the entry block
[and is used in some side-effect instruction] the SCEV could be marked
as non-wrapping, couldn't it? How do you determine if the operation will
poison a side-effect instruction anyway?
[0]: IOW, we don't "key" on the no-wrap tag, we just key on the
arithmetic bits of the operation and mutate SCEV expression in-place
when we discover NUW / NSW.
-- Sanjoy
Cheers,
Johannes
Johannes Doerfert wrote:
> Is there any plan to use e.g., post-dominance information to
> propagate wrapping flags?
None that I'm aware of.
> If x +nsw y post-dominates the entry block
> [and is used in some side-effect instruction] the SCEV could be marked
> as non-wrapping, couldn't it?
Yes, but we have to be careful to account for exceptions, infinitely
looping functions, exit(0) etc. I.e. in
void foo() {
use(x + y);
exit(0);
use(x +nsw y);
}
the nsw in the later x+y cannot be transferred to the former x+y
(assume use is a side effecting use).
> How do you determine if the operation will
> poison a side-effect instruction anyway?
SCEV uses llvm::isKnownNotFullPoison for this.
-- Sanjoy
Hi Johannes,
Sanjoy has given you great information already.
Hi Bjarke,
Thanks for your response. I agree that Sanjoy's email already answered
my question. To be honest, I am not 100% sure about the loop code and
the consequences but that doesn't matter to much at the moment. What
matters is that I have made progress on explaining the mismatch between
the ScalarEvolution results and what I would have expected, or better,
liked. At the moment I think that the context in-sensitivity of
ScalarEvolution is what is preventing me from the results I would like
to see. Therefor, I have to look at the IR during Polly's context
sensitive analysis/modeling step. As our "context" includes information
about possibly "non-returning function calls", infinite loops, etc. I
think that should allow to get precise but also sound results.
Thank you both again,
Johannes
Replace context with flow in the last mail... sry.