nsw is still logically inconsistent

The recent discussion of nsw led me to go back and review the
current definition of nsw in greater depth, and it turns out that
even with all the theoretical effort, it's still logically
inconsistent.

First, a warning: The scenario below is artificial. This is just
a demonstration. Also, ignore the fact that instcombine would zap
everything. Depending on that would be an implicit pass dependency,
which is against the rules.

Ok, consider this LLVM IR code fragment:

  br i1 %overflow_check, label %no_overflow, label %end

no_overflow:
  %t0 = add nsw i32 %a, %b
  %t1 = sext i32 %t0 to i64
  %t2 = ashr i64 %t1, 31
  %t3 = add i64 %t2, 1
  %t5 = icmp ult %t3, 2
  %t6 = udiv i1 1, %t5

Assume label %no_overflow has no other predecessors. And assume
adding %a and %b can sometimes produce overflow, but only when
%overflow_check is false.

This code has no undefined behavior. It's a bit subtle, but in
particular, note that it's not possible for the udiv to divide by
zero, which would be undefined behavior. The sext produces a
value where the high 33 bits are all identical, the ashr produces
a value which is either all ones or all zeros, the add produces
either 0 or 1, and the icmp ilt is always 1.

We first perform a speculation transformation, hoisting all of the
code above the %overflow_check branch:

  %t0 = add nsw i32 %a, %b
  %t1 = sext i32 %t0 to i64
  %t2 = ashr i64 %t1, 31
  %t3 = add i64 %t2, 1
  %t5 = icmp ult %t3, 2
  %t6 = udiv i1 1, %t5
  br i1 %overflow_check, label %no_overflow, label %end

no_overflow:

Was this valid?

If nsw overflow is immediate undefined behavior, this transformation
would break the program, because the overflow is no longer guarded
by %overflow_check. But a premise of this exercise is that we want
to be able to speculate add nsw instructions. For now, let's assume
that there's a way to define nsw which permits this, with some kind
of deferred undefined behavior semantics.

It's obviously valid to hoist the sext, ashr, add, and icmp that
follow, because those instructions never have side effects.

The udiv is less obvious, but an analysis of the icmp, add, ashr,
and sext above it can show that the denominator of the udiv will
never be zero, so the udiv will never have side effects. Note that
this analysis doesn't actually need to look at the original add nsw
instruction. Climbing up to the sext is enough.

Next, we perform a promotion transformation, converting the add nsw
from i32 to i64:

  %s0 = sext i32 %a to i64
  %s1 = sext i32 %b to i64
  %t0 = add nsw i64 %s0, %s1
  %t2 = ashr i64 %t0, 31
  %t3 = add i64 %t2, 1
  %t5 = icmp ult %t3, 2
  %t6 = udiv i1 1, %t5
  br i1 %overflow_check, label %no_overflow, label %end

no_overflow:

Was this valid?

Any time the new i64 add would produce a different value than the
original sext would have, it would be a case where the 32-bit add
had an overflow. The nsw says that the program would have undefined
behavior in that case if the result is used, so this should be ok.

Unfortunately, the final code is actually broken. If adding %a
and %b as i32 values would overflow, adding them as i64 values
would produce a value for which bit 31 is not equal to bit 32, so
the subsequent ashr would produce a value which is not -1 or 0, the
subsequent add would produce a value which is not 0 or 1, and the
icmp ult would return 0, and the udiv would have undefined
behavior. And it's unconditional.

Consequently, the current definition of nsw is still inconsistent.

Dan

Dan Gohman <gohman@apple.com> writes:

Next, we perform a promotion transformation, converting the add nsw
from i32 to i64:

  %s0 = sext i32 %a to i64
  %s1 = sext i32 %b to i64
  %t0 = add nsw i64 %s0, %s1
  %t2 = ashr i64 %t0, 31
  %t3 = add i64 %t2, 1
  %t5 = icmp ult %t3, 2
  %t6 = udiv i1 1, %t5
  br i1 %overflow_check, label %no_overflow, label %end

no_overflow:

Was this valid?

Any time the new i64 add would produce a different value than the
original sext would have, it would be a case where the 32-bit add
had an overflow. The nsw says that the program would have undefined
behavior in that case if the result is used, so this should be ok.

Unfortunately, the final code is actually broken. If adding %a
and %b as i32 values would overflow, adding them as i64 values
would produce a value for which bit 31 is not equal to bit 32, so
the subsequent ashr would produce a value which is not -1 or 0, the
subsequent add would produce a value which is not 0 or 1, and the
icmp ult would return 0, and the udiv would have undefined
behavior. And it's unconditional.

I'm not following. If the promotion to i64 produces a different value,
then the nsw smeantic was violated, leading to undefined behavior, as
you note. That that point all bets are off. Divide by zero certainly
is a perfectly valid expression of undefined behavior. If we had a
delayed check we would have to put it somewhere before the udiv. We
would probably need some kind of fixup path _a_la_ IA64's check
instruction.

The whole point of this is to be able to hoist nsw operations. But any
form of static speculation will require fixup code in the cases where
the speculation is wrong. There really is a cost to hoisting code with
potential side-efffects across branches. I don't see any way to get
around that.

                               -Dave

We first perform a speculation transformation, hoisting all of the
code above the %overflow_check branch:

  %t0 = add nsw i32 %a, %b
  %t1 = sext i32 %t0 to i64
  %t2 = ashr i64 %t1, 31
  %t3 = add i64 %t2, 1
  %t5 = icmp ult %t3, 2
  %t6 = udiv i1 1, %t5
  br i1 %overflow_check, label %no_overflow, label %end

no_overflow:

Was this valid?

If nsw overflow is immediate undefined behavior, this transformation
would break the program, because the overflow is no longer guarded
by %overflow_check. But a premise of this exercise is that we want
to be able to speculate add nsw instructions. For now, let's assume
that there's a way to define nsw which permits this, with some kind
of deferred undefined behavior semantics.

Could we say that moving the udiv was invalid? It is a instruction
witch can cause undefined behavior and doing the move makes it exposed
to poison values in conditions it was not before.

Dan

Cheers,
Rafael

Note that this matches existing practice: IIRC, for a testcase like
the given testcase, LICM will hoist everything except the div
instruction.

-Eli

The way I have always thought of this is that speculating an operation with side effects (udiv) cannot be safely done based on a speculative value.

So, whenever you speculate a control dependent instruction (add<nsw>), regardless of whether it has side effects, you need to transfer the control dependence to its consumers.

In an IR with explicit control dependence, that can be done literally by transferring the control edge downward in the SSA graph until you reach an operation with side-effects. In an IR without explicit control dependence, like all C compilers I've seen, you have no way to enforce this, so end up relying on each optimization to be aware and not too aggressive.

-Andy

(who used to safely speculate loads above type checks)

We first perform a speculation transformation, hoisting all of the
code above the %overflow_check branch:

  %t0 = add nsw i32 %a, %b
  %t1 = sext i32 %t0 to i64
  %t2 = ashr i64 %t1, 31
  %t3 = add i64 %t2, 1
  %t5 = icmp ult %t3, 2
  %t6 = udiv i1 1, %t5
  br i1 %overflow_check, label %no_overflow, label %end

no_overflow:

Was this valid?

If nsw overflow is immediate undefined behavior, this transformation
would break the program, because the overflow is no longer guarded
by %overflow_check. But a premise of this exercise is that we want
to be able to speculate add nsw instructions. For now, let's assume
that there's a way to define nsw which permits this, with some kind
of deferred undefined behavior semantics.

Could we say that moving the udiv was invalid? It is a instruction
witch can cause undefined behavior and doing the move makes it exposed
to poison values in conditions it was not before.

We could, but it would preclude desirable optimizations.

Note that this matches existing practice: IIRC, for a testcase like
the given testcase, LICM will hoist everything except the div
instruction.

LICM on ToT hoists the div. Even speculatively.

The other thing is that div is just one example. The problem could
also come up with an array load with an index could be safe to
speculate if the index is known to be bounded. isDereferenceablePointer
currently doesn't know how to do this, but it'll probably get
smarter some day.

Dan

The original code was well-behaved for all inputs.
The final code has undefined behavior for some inputs.

Fixup paths don't really seem to fit here. We're never going
to want fixup paths in the final output, for example. And
there is no way to fix up undefined behavior.

Dan

LICM on ToT hoists the div. Even speculatively.

The other thing is that div is just one example. The problem could
also come up with an array load with an index could be safe to
speculate if the index is known to be bounded. isDereferenceablePointer
currently doesn't know how to do this, but it'll probably get
smarter some day.

So, looking at just this example, it looks like semantics we assign to
poison values have to do one of

*) Avoid the udiv being moved, since a valid removal of nsw would create
a program that is now exposed to undefined behavior.

*) Say that use of poison values only produces other poison values, not
undefined behavior.

The second option would give us a lot of liberty to move instruction
around, but is a major change to what the IL looks like these days. In
particular:

*) removing a nsw would become invalid, as it would change a poison
value to a 0, which would cause udiv to have undefined behavior again.

*) Codegen would have to be very careful not to produce undefined
behavior when handling instructions that potentially see poison values.

If we really want the ability to expose instructions with side effects
to poison, maybe the best thing to do is to have poison resistant
variants. For example, we would say that moving the udiv is invalid, but
it could be replaced with a "udiv poison i1 1, %t5". The poison variant
has the same definition as udiv, put produces poison when t5 is 0 (or
poison) and therefore can be moved. The same goes for speculating a load.

Dan

Cheers,
Rafael

Dan Gohman <gohman@apple.com> writes:

I'm not following. If the promotion to i64 produces a different value,
then the nsw smeantic was violated, leading to undefined behavior, as
you note. That that point all bets are off. Divide by zero certainly
is a perfectly valid expression of undefined behavior. If we had a
delayed check we would have to put it somewhere before the udiv. We
would probably need some kind of fixup path _a_la_ IA64's check
instruction.

The original code was well-behaved for all inputs.
The final code has undefined behavior for some inputs.

Fixup paths don't really seem to fit here. We're never going
to want fixup paths in the final output, for example. And
there is no way to fix up undefined behavior.

My point is that hoisting the div above its control dependence is
illegal unless you have some way of mitigating the poison path. Or we
define new "poison" IR operations as Rafael suggested. I'm sort of
intrigued by that idea. It's similar to a check operation but we could
define whatever semantics we want (produce a safe value if poison,
etc.).

Again, it's undefined behavior so we get to define it.

                             -Dave

LICM on ToT hoists the div. Even speculatively.

The other thing is that div is just one example. The problem could
also come up with an array load with an index could be safe to
speculate if the index is known to be bounded. isDereferenceablePointer
currently doesn't know how to do this, but it'll probably get
smarter some day.

So, looking at just this example, it looks like semantics we assign to
poison values have to do one of

*) Avoid the udiv being moved, since a valid removal of nsw would create
a program that is now exposed to undefined behavior.

This could suppress legitimate optimizations. udiv is an expensive
operation on most targets, and being able to hoist it out of an
if statement inside a loop, as LICM does now, could be really
desirable in some cases.

*) Say that use of poison values only produces other poison values, not
undefined behavior.

The second option would give us a lot of liberty to move instruction
around, but is a major change to what the IL looks like these days. In
particular:

*) removing a nsw would become invalid, as it would change a poison
value to a 0, which would cause udiv to have undefined behavior again.

*) Codegen would have to be very careful not to produce undefined
behavior when handling instructions that potentially see poison values.

If we really want the ability to expose instructions with side effects
to poison, maybe the best thing to do is to have poison resistant
variants. For example, we would say that moving the udiv is invalid, but
it could be replaced with a "udiv poison i1 1, %t5". The poison variant
has the same definition as udiv, put produces poison when t5 is 0 (or
poison) and therefore can be moved. The same goes for speculating a load.

udiv is just one example. In the case of an indexed load, this
would require codegen to do bounds checking, which I think is beyond
its charter here.

Dan

Could you spell out what this would look like? My sense is that
check operations are a hardware-oriented approach, and that
thinking about this as a hardware architecture problem means
ignoring the range of transformations that compilers want to do.

Dan