The nsw story

int a = INT_MAX, b = 1;
long c = (long)(a + b);

What is the value of c, on an LP64 target?

If a and b are promoted to 64-bit, c is 0x0000000080000000.

In a world where signed add overflow returns undef, c could be any of
0x0000000000000000
...
0xffffffffffffffff
however it can't ever be 0x0000000080000000, because there's no 32-bit value
the undef could take which sign-extends into that 64-bit value.

But since the add nsw returns undef, then isn't c "sext i32 undef to
i64", and thus also undef?

No; sext i32 undef to i64 returns an extraordinary thing. It's a
64-bit value where the low 32 bits are undef, and the high
32 bits are sign extension from whatever bit 31 happened to be.

More precisely, the high 33 bits are known to be the same.

operands and opcodes. It's quite magical.

It's actually not that magical. It's also essential that we have this behavior, otherwise normal bitfield code has "undefined" behavior and is misoptimized. This is why the result of "undef & 1" is known to have high bits zero.

-Chris

Dan Gohman <gohman@apple.com> writes:

For example, suppose we want to convert the && to &, and the ?: to a
select, in this code:

if (a && (b ? (c + d) : e)) {

because we have a CPU architecture with poor branch prediction, or
because we want more ILP, or because of some other reason. Here's what the
LLVM IR for that might look like:

   %t0 = add nsw i32 %c, %d
   %t1 = select i1 %b, i32 %t0, i32 %e
   %t2 = icmp ne i32 %t1, 0
   %t3 = and i1 %a, %t2
   br i1 %t3, ...

The extra branching is gone, yay. But now we've put an add nsw out there
to be executed unconditionally. If we make the select an observation
point, we'd have introduced undefined behavior on a path that didn't
previously have it.

Unless the undefined behavior only triggered if the select actually
produced a poisoned result. Then it should have the same behavior as
the branch, no?

A foodtaster instruction doesn't really solve this problem, because
we'd have to put it between the add and the select, and it would be
just as problematic.

Or you put it immediately after the select.

One could argue that aggressive speculation is a low-level optimization
better suited to CodeGen than the optimizer, as LLVM divides them, and
that perhaps the cost for providing this level of flexibility in the
optimizer is too high, but that's a different argument.

No, I think we want the flexibility. But I believe there are sane ways
to do this.

                             -Dave

Eli Friedman <eli.friedman@gmail.com> writes:

But since the add nsw returns undef, then isn't c "sext i32 undef to
i64"

Yes.

and thus also undef?

No: it's some value between INT32_MIN and INT32_MAX. Think of undef
as a read from an arbitrary register: you can't predict what it will
contain, but you know it contains some representable value.

How does undef not cover the above case? "some value between INT32_MIN
and INT32_MAX" is certainly "some representable value."

                              -Dave

Dan Gohman <gohman@apple.com> writes:

No; sext i32 undef to i64 returns an extraordinary thing. It's a
64-bit value where the low 32 bits are undef, and the high
32 bits are sign extension from whatever bit 31 happened to be.

And thus they are undef because the low 32 bits were/are undef.

And the low 32 bits in this new value also fluctuate, and the
high 31 bits in this value fluctuate with them in concert with
bit 31 of the new value.

That's just ridiculous. Undef is undef. We don't know anything special
about it.

FWIW, you can also get non-bitwise fluctuation, such as (undef*3), which
is a value which fluctuates around multiples of 3. Or (undef*3)/5+1,
which fluctuates around, well, any value which could be computed by
that expression. You can build arbitrarily complex expression trees
around undef, and everything fluctuates according to the constraints of
operands and opcodes. It's quite magical.

This is silly. We can't count on undef having any special properties at
all. If we do, IMHO it's a bug.

                                     -Dave

Chris Lattner <clattner@apple.com> writes:

No; sext i32 undef to i64 returns an extraordinary thing. It's a
64-bit value where the low 32 bits are undef, and the high
32 bits are sign extension from whatever bit 31 happened to be.

More precisely, the high 33 bits are known to be the same.

I think that's a very risky assumption. The name "undef" does not
convey that at all.

operands and opcodes. It's quite magical.

It's actually not that magical. It's also essential that we have this
behavior, otherwise normal bitfield code has "undefined" behavior and
is misoptimized. This is why the result of "undef & 1" is known to
have high bits zero.

Well sure, because we know (x & 1) has the high bits set to zero for any
value of x. In the sext case, we know nothing about the high bits
because we don't know what bit 32 was/is.

Yes, we know the high bits are the same but it's surprising to me that
this would be represented as "undef." "undef" to me means I can assume
any value at all, not restricted to "all the bits are the same."

                                -Dave

That was my thinking. The select is an observation point for its first operand,
but then merely propagates poison from the second or third, just like any
computational instruction would.
The icmp is an observation point for both inputs.

Pogo

We don't represent that as undef in the IR; constant folding will fold
"sext i32 undef to i64" to zero, not undef.

-Eli

The optimizations we’re talking about are forms of safe
speculation. Doing that at IR level is fine.

Hardware support for NaT bits + check instructions have been popping
up on this message thread. Hardware speculation exists solely to
support unsafe speculation, in which certain operations need to be
reexecuted under certain conditions. It is not something we would ever
want to represent in IR. It isn’t ammenable to CFG+SSA form.

Runtime checking for undefined behavior would be implemented as
overflow checks, etc. by the front end. I don’t think it’s related to
unsafe speculation. In other words, I don’t see the purpose of a
“check/foodtaster” instruction.

-Andy

Eli Friedman <eli.friedman@gmail.com> writes:

Yes, we know the high bits are the same but it's surprising to me that
this would be represented as "undef."

We don't represent that as undef in the IR; constant folding will fold
"sext i32 undef to i64" to zero, not undef.

Ok, so why the fuss over sext?

                               -Dave

Andrew Trick <atrick@apple.com> writes:

The optimizations we're talking about are forms of safe
speculation. Doing that at IR level is fine.

Hardware support for NaT bits + check instructions have been popping
up on this message thread. Hardware speculation exists solely to
support unsafe speculation, in which certain operations need to be
reexecuted under certain conditions. It is not something we would ever
want to represent in IR. It isn't ammenable to CFG+SSA form.

Runtime checking for undefined behavior would be implemented as
overflow checks, etc. by the front end. I don't think it's related to
unsafe speculation. In other words, I don't see the purpose of a
"check/foodtaster" instruction.

The purpose of all of these discussions is to alleviate the
computational explosion of static analysis that Dan described in the
presence of nsw bits. The point is to reduce the search space for
static checking. If we don't care about that, then all of this
discussion is moot. :slight_smile:

                                  -Dave

Right. I can see how a value-undefined-if-poison marker could yield more precise static analysis. I didn't mean to discourage that approach, although I'm not aware of any imminent problem that it solves.

-Andy

Put the observation point on the add, the select, the icmp, or even the
and, or between any of them, and it'll still be before the branch. That
means that the code will have unconditional undefined behavior when the
add overflows, which the original code did not have.

Dan

Sorry, you lost me there. The behavior of the observation point isn’t undefinedunless you took the overflow path in the first place. I don’t see how that makes
it unconditionally undefined.

The discussion started based on a premise of propagating poison values past
all these kinds of places; now you don’t want to let me move them past any
of these places? Clearly I don’t follow how you think my suggestion diverges
unacceptably from your original sketch. Which might mean it’s time to bail…

Pogo

It's not magical from an LLVM perspective. But it is one
of the areas where the popular mental model of LLVM IR as an
assembly language breaks down. Undef is weirder than PHI nodes.

Dan

Dan Gohman <gohman@apple.com> writes:

Or you put it immediately after the select.

That was my thinking. The select is an observation point for its first operand,
but then merely propagates poison from the second or third, just like any
computational instruction would.
The icmp is an observation point for both inputs.

Put the observation point on the add, the select, the icmp, or even the
and, or between any of them, and it'll still be before the branch. That
means that the code will have unconditional undefined behavior when the
add overflows, which the original code did not have.

So put it after the branch. Absolutely nothing prevents that. That's
what IA64 compilers have to do today.

                                -Dave