The nsw story

The old and popular tradition of "int" being the default all-purpose type
in C hit a snag with the arrival of 64-bit general-purpose architectures,
when, for reasons of memory usage and compatibility, "int" was fixed
at 32 bits. This meant it was no longer the same size as a pointer, the
size of an array index, or the widest native integer type. (Sequence of
events simplified for brevity.)

The C standard tries to help with a bunch of typedefs: int_fast32_t,
int_least32_t, intptr_t, int32_t, size_t, ptrdiff_t, and more. With all
these great typedefs, one could imagine that there's no reason anyone
should ever use plain "int" ever again. Unfortunately, juggling all these
typedefs can be impractical, for several reasons.

C doesn't have function overloading, so whenever one talks to the standard
library (e.g. to call "abs"), one must know what the underlying types are
(e.g. which of "abs", "absl", or "absll" is needed?). printf also has
a problem, and although the C standard actually tries to help there,
"%" PRIdLEAST32 and "%" PRIdFAST32 have yet to seriously challenge the
mindshare of "%d". And there are other issues.

"int" remains widely popular, even though it isn't quite the close fit
to hardware that it used to be. Consider this simple testcase, which
is representative of a broader issue, containing an innocent use of int:

  void my_copy(char *dst, const char *src, int n) {
    for (int i = 0; i != n; ++i)
      dst[i] = src[i];
  }

On LP64 platforms, this code has implicit sign-extensions to convert
the 32-bit int "i" to 64 bits in the array index expressions. Sign
extensions are relatively fast on most machines, but there's also
the cost of delaying the computations of the addresses for the memory
references, which is significant. And small loops turn small problems
into big problems. The result is around a 12% slowdown on the machine
I happen to be typing this on.

(I'm ignoring loop vectorization and memcpy pattern matching here,
to keep this example simple without loosing too much generality.)

However, C compilers have a way to fix this problem, by repurposing
an ancient hack. Long ago, the C standard was designed to accommodate
machines that didn't use two's complement representations for signed
integers. Different representations lead to different behaviors on
overflow, and the C committee decided to make signed integer arithmetic
portable by declaring that overflow in a signed integer type gets
undefined behavior. Since then, two's complement has largely taken over,
but the C standard still has that rule.

Today, compilers use this rule to promote 32-bit "int" variables to 64
bits to eliminate these troublesome sign extensions. This is considered
valid because any time this would change the behavior of the program,
it would be an overflow in the narrower type, which the rule says is
undefined behavior.

Now consider LLVM. In LLVM, "int" is lowered to "i32" (on LP64 targets),
and signedness is a property of operators, rather than types. For many
years, LLVM was unable to promote "int" variables, because it lacked
the necessary information. Then, I was assigned to fix this. I added
a flag named "nsw", which stands for No Signed Wrap, to be attached to
add instructions and others. It's not a very satisfying name, but it's
sufficiently unique and easy to find in LangRef. The purpose of the nsw
flag is to indicate instructions which are known to have no overflow.

The nsw flag is simple on the surface, but it has complexities. One
is that nsw add is not an associative operation (unlike mathematical
integers, two's complement wrapping signed integers, and unsigned
integers, which are all associative). With nsw, ((a + b) + c) is not
equivalent to (a + (b + c)), because (b + c) might overflow, even if
((a + b) + c) has no overflow. This is inconvenient, but it's manageable.

A much bigger complication is that an add with undefined behavior on
overflow is an instruction that potentially has side effects. Side effects
mean the compiler can't freely move such instructions around. This is
jarring, because add instructions are the kinds of things that otherwise
can be easily moved around -- they are cheap and safe. An example of this
is short-circuit elimination -- turning && into &. If the right operand
of the && contains an add, it will need to be speculatively executed,
and that isn't safe if it might have side effects.

The first solution for this was to define nsw add to return undef on
overflow, instead of invoking immediate undefined behavior. This way,
overflow can happen and it doesn't cause any harm unless the value
is actually used. With the code motion problem fixed, I proceeded to
implement nsw using this approach, and a sign-extension elimination
optimization on top of it.

However, I later realized that undef isn't actually expressive enough
for this purpose. Undef can produce any value, but it will still be some
specific value of its type (with fluctuation, but that's not important
here). When an integer add is promoted to a wider type, an expression in
the narrower type which would overflow is converted to an expression in
the wider type which returns a value which can't be expressed in the
narrower type. There is no possible value in the narrower type that
undef could take on which would produce the same behavior.

When I asked Chris what to do about this problem, he didn't understand
why I was worrying about it. Admittedly, it is a fairly obscure problem.
But at the time, I was of the opinion that we shouldn't sweep known
semantic problems under the rug. We don't have formal semantics, and we
don't do correctness proofs, but it still seemed that we should respond
when we do notice inconsistencies. I wanted to get to the bottom of it.

I set out to find some way to say that an nsw add which overflows doesn't
invoke immediate undefined behavior, but invokes undefined behavior only
when its result is used. It can't be just any use though, because I didn't
want to rule out the possibility of hoisting chains of instructions,
like a+b+c+d+e. LLVM doesn't usually speculate this aggressively today,
but there are plausible scenarios where it might want to in the future.

This led me to think about making nsw overflow return some kind
of poisoned value, like undef, only more powerful. The poison would
propagate down the def-use graph until it reaches an instruction that
is capable of triggering the undefined behavior.

I called the poison value "trap", because of some superficial
similarity to C's "trap value" concept, although I now believe this to
be mistaken. It really is quite different. Also, it's unrelated to the
llvm.trap intrinsic. It should probably be renamed.

The presence of this poison value means that undefined behavior has been
invoked on a potentially speculative code path. The consequences of the
undefined behavior are deferred until that code path is actually used
on some non-speculative path.

This concept seemed to fit all the requirements. It allows for
arbitrary speculation, it's sufficient for sign-extension elimination,
and it's useful for a handful of other neat tricks as well. I wrote up
a description of this concept, and it's been in LangRef ever since. It
sticks out though, because it got pretty big and complex, especially
in view of its relative obscurity. Realistically speaking, it's
probably not fully watertight yet. But I'm not aware of any fundamental
flaws.

Perhaps the best way to understand the complexity is to imagine writing
a tool to detect this kind of undefined behavior. Detecting signed
overflow in C source is pretty easy: just insert checking code at each
signed operation. But as we've just seen, LLVM IR has different rules
from C. The undefined behavior of nsw doesn't matter until it's
actually observed.

So initially, one might imagine detecting trouble by adding a bit
to every Value to track whether it is a poison value. That's pretty
heavy-weight, but it's doable. It gets worse though, because poison
values can also be transmitted through memory. So one would also
have to add a flag to bytes of memory which are poisoned. But it gets
even worse, because poison values can be used as branch conditions,
so one would have to do advanced CFG analysis to prove whether branch
decisions based on poisoned values actually matter. And it gets worse
yet, because the control structure of the program may be dynamic, so
what one would really need to do is something like non-deterministic
execution. This can easily take exponential amounts of time and memory.

A natural reaction to this problem is to think that LLVM IR is so nice
and pretty that naturally there must be a nice and pretty solution. Here
are some alternatives that have been considered:

- Go back to using undef for overflow. There were no known real-world
   bugs with this. It's just inconsistent.

- Define add nsw as a fully side-effecting operation, and accept the
   limits on code motion that this implies. However, as LLVM starts doing
   profile-guided optimizations, and starts thinking about more diverse
   architectures, code speculation will likely become more important.

- Define add nsw as a fully side-effecting operation, and teach
   optimization passes to strip nsw when moving code past control
   boundaries. This is seen as suboptimal because it prevents subsequent
   passes from making use of the nsw information. And, it's extra work
   for optimization passes.

- Instead of trying to define dependence in LangRef, just say that if
   changing the value returned from an overflowing add nsw would
   affect the observable behavior of the program, then the behavior of
   the program is undefined. This would reduce the amount of text in
   LangRef, but it would be a weaker definition, and it would require
   sign-extension optimizers and others to do significantly more work
   to establish that their transformations are safe.

- Give up on nsw and have compilers emit warnings when they are unable
   to perform some optimization due to their inability to exclude the
   possibility of overflow. Obviously the warnings would not be on by
   default, or even -Wall or probably even -Wextra, so -Werror users need
   not revolt. Many people are often faced with code that they cannot
   modify for any number of reasons, and would not be able to make use
   of such warnings. It's an interesting tradeoff, but it's unpopular.

Unfortunately, none of these are completely nice and pretty. Because of
this, and because most people don't care, nsw with all its conceptual
complexity has survived.

Dan

A few thoughts from someone on the sidelines; hope they’re helpful.

The first solution for this was to define nsw add to return undef on
overflow, instead of invoking immediate undefined behavior. This way,
overflow can happen and it doesn’t cause any harm unless the value
is actually used. With the code motion problem fixed, I proceeded to
implement nsw using this approach, and a sign-extension elimination
optimization on top of it.

However, I later realized that undef isn’t actually expressive enough
for this purpose. Undef can produce any value, but it will still be some
specific value of its type (with fluctuation, but that’s not important
here). When an integer add is promoted to a wider type, an expression in
the narrower type which would overflow is converted to an expression in
the wider type which returns a value which can’t be expressed in the
narrower type. There is no possible value in the narrower type that
undef could take on which would produce the same behavior.

Part of the confusion seems to come from overloading “undefined.”
The VAX architecture spec nicely dealt with the distinction between
unspecified results and unspecified behavior by consistently using
distinct terms: unpredictable results, and undefined behavior.
An operation producing an unpredictable result could produce any
bit-pattern that fit in the output operand; but it would not trap
or do anything else unfriendly.
An operation causing undefined behavior means anything could happen,
from no-effect to machine meltdown.

I always thought these were usefully distinct concepts, and the
terminology convention really clarified that.

I set out to find some way to say that an nsw add which overflows doesn’t
invoke immediate undefined behavior, but invokes undefined behavior only
when its result is used. It can’t be just any use though, because I didn’t
want to rule out the possibility of hoisting chains of instructions,
like a+b+c+d+e. LLVM doesn’t usually speculate this aggressively today,
but there are plausible scenarios where it might want to in the future.

This led me to think about making nsw overflow return some kind
of poisoned value, like undef, only more powerful. The poison would
propagate down the def-use graph until it reaches an instruction that
is capable of triggering the undefined behavior.

Reminds me of how Itanium does speculation, although the details are
getting fuzzy (it has been a while). Registers have an associated
“NaT” (Not A Thing) flag, which gets set when something bad happens.
(Usually loading from nonexistent or protected memory. Can’t remember
whether integer overflows can set this.)
Some operations are NaT-tolerant; any input NaT is simply propagated
to the output. Arithmetic, sign-extension, bit fiddling, and the like
are NaT-tolerant.
Some operations are NaT-intolerant; any input NaT causes a trap.
Storing to memory, pointer derefs, and I think compares are all
NaT-intolerant.
There’s also a “check” instruction that exists solely to check
for NaT flags. That might be specific to memory traps; when you trap
on a load, you want to know what address trapped, and the check
instruction has a memory-address operand to provide that info.

So initially, one might imagine detecting trouble by adding a bit
to every Value to track whether it is a poison value. That’s pretty
heavy-weight, but it’s doable. It gets worse though, because poison
values can also be transmitted through memory. So one would also
have to add a flag to bytes of memory which are poisoned. But it gets
even worse, because poison values can be used as branch conditions,
so one would have to do advanced CFG analysis to prove whether branch
decisions based on poisoned values actually matter. And it gets worse
yet, because the control structure of the program may be dynamic, so
what one would really need to do is something like non-deterministic
execution. This can easily take exponential amounts of time and memory.

I suggest that storing to memory, and conditionals, and anything
else that seems to lead toward the really hairy situations, should
be cases where the nsw/poison/trap is “actually observed.”

  • Values in memory can be observable by other parts of the program, or
    by other threads/processes.
  • After you’ve made a control-flow decision, memory references in the
    successor blocks are part of the observable behavior of the program.

This is taking a broader view of “observable” than what’s defined in the
C or C++ standards, which steadfastly ignore a variety of practical realities.
But I think it’s a reasonable and defensible position, that still allows you
some scope for operating on these things in a sensible way.

Pogo

Part of the confusion seems to come from overloading "undefined."
The VAX architecture spec nicely dealt with the distinction between
unspecified results and unspecified behavior by consistently using
distinct terms: _unpredictable_ results, and _undefined_ behavior.
An operation producing an unpredictable result could produce any
bit-pattern that fit in the output operand; but it would not trap
or do anything else unfriendly.
An operation causing undefined behavior means anything could happen,
from no-effect to machine meltdown.

I always thought these were usefully distinct concepts, and the
terminology convention really clarified that.

LLVM IR does make these distinctions, though the language is a bit
subtle.

"Undefined behavior" and "behavior is undefined" mean anything could
happen. Demons may fly out your nose.

"undef" is an arbitrary fluctuating bit pattern. Also it causes
operations that use it to exhibit constrained fluctuation.

"Result is undefined" means an arbitrary bit pattern may be returned.
Sometimes this means "undef" and sometimes this means a fixed bit
pattern, though this inconsistency may be unintentional.

I suggest that storing to memory, and conditionals, and anything
else that seems to lead toward the really hairy situations, should
be cases where the nsw/poison/trap is "actually observed."
- Values in memory can be observable by other parts of the program, or
by other threads/processes.
- After you've made a control-flow decision, memory references in the
successor blocks are part of the observable behavior of the program.

This is taking a broader view of "observable" than what's defined in the
C or C++ standards, which steadfastly ignore a variety of practical realities.
But I think it's a reasonable and defensible position, that still allows you
some scope for operating on these things in a sensible way.

Prohibiting poison values from propogating through memory would mean
that the reg2mem pass would no longer be a semantics-preserving pass.
Prohibiting it from propogating through control flow would mean that a
select instruction wouldn't be equivalent to a conditional branch and
a phi. These are problems that aren't really important for hardware
ISAs, but are important for LLVM IR.

Dan

+1 for the simple and obvious answer.

-Chris

To be clear, the main sign-extension elimination optimization is not valid under
the simple and obvious answer. Are you proposing to do it anyway?

Dan

Please define "valid". We discussed this a very very long time ago, and I don't recall the details.

-Chris

int a = INT_MAX, b = 1;
long c = (long)(a + b);

What is the value of c, on an LP64 target?

By standard C rules, the add overflow invokes immediate undefined behavior of course.

If a and b are promoted to 64-bit, c is 0x0000000080000000.

In a world where signed add overflow returns undef, c could be any of
  0x0000000000000000
  0x0000000000000001
  0x0000000000000002
  ...
  0x000000007fffffff
or
  0xffffffff80000000
  0xffffffff80000001
  0xffffffff80000002
  ...
  0xffffffffffffffff
however it can't ever be 0x0000000080000000, because there's no 32-bit value
the undef could take which sign-extends into that 64-bit value.

Therefore, if signed add overflow returns undef, promoting such 32-bit variables to
64-bit variables is not a behavior-preserving transformation.

Dan

Dan Gohman <gohman@apple.com> writes:

The presence of this poison value means that undefined behavior has been
invoked on a potentially speculative code path. The consequences of the
undefined behavior are deferred until that code path is actually used
on some non-speculative path.

This is exactly what was done on Itanium and its follow-ons to handle
speculative loads. It's a perfectly reasonable approach IMHO in its
basic formulation.

The complication you introduced is to go beyond thwe basic formulation
and consider certain kinds of uses "special" in that they don't trigger
undefined behavior. This leads to the analysis problems you described.

- Go back to using undef for overflow. There were no known real-world
   bugs with this. It's just inconsistent.

I have always considered "undef" to mean "any possible value" so I don't
see a real problem with promoted integers containing a value thhat can't
fit in the original type size. The value in the original type size is
undetermined by the definition of "undef" so who cares what we choose as
the "real" value?

That said, there are a couple of times I ran into situations where LLVM
assumed "undef" somehow restricted the possible set of values it could
represent. As I recall I ran into this in instcombine and dagcombine.
I will have to go back and search for the details. In any event, I
think these things are bugs in LLVM.

So "undef" seems like a reasonable approach to me.

- Define add nsw as a fully side-effecting operation, and accept the
   limits on code motion that this implies. However, as LLVM starts doing
   profile-guided optimizations, and starts thinking about more diverse
   architectures, code speculation will likely become more important.

It will indeed. I don't like this approach.

- Define add nsw as a fully side-effecting operation, and teach
   optimization passes to strip nsw when moving code past control
   boundaries. This is seen as suboptimal because it prevents subsequent
   passes from making use of the nsw information. And, it's extra work
   for optimization passes.

Agreed.

- Instead of trying to define dependence in LangRef, just say that if
   changing the value returned from an overflowing add nsw would
   affect the observable behavior of the program, then the behavior of
   the program is undefined.

Isn't that exactly what the C standard says? Now, LLVM handles more
than C so we need to consider that. Personally, I agree that this is a
less than satisfying solution.

Are you mainly worried about the promoted value being vcasted back to
the original type?

- Give up on nsw and have compilers emit warnings when they are unable
   to perform some optimization due to their inability to exclude the
   possibility of overflow.

This would be very bad. We groan every time we see "unsigned" used as a
loop index because it seriously degrades dependence analysis and
vectorization for exactly this reason.

                               -Dave

Paul Robinson <pogo.work@gmail.com> writes:

Some operations are NaT-tolerant; any input NaT is simply propagated
to the output. Arithmetic, sign-extension, bit fiddling, and the like
are NaT-tolerant.

Oh yeah, I forgot about that. Forget all the stuff I said about "base
formulation" and complicating that. :slight_smile:

I suggest that storing to memory, and conditionals, and anything
else that seems to lead toward the really hairy situations, should
be cases where the nsw/poison/trap is "actually observed."

Agreed.

This is taking a broader view of "observable" than what's defined in the
C or C++ standards, which steadfastly ignore a variety of practical realities.
But I think it's a reasonable and defensible position, that still allows you
some scope for operating on these things in a sensible way.

I like this approach a lot.

                         -Dave

Dan Gohman <gohman@apple.com> writes:

Prohibiting poison values from propogating through memory would mean
that the reg2mem pass would no longer be a semantics-preserving pass.

Or it means you couldn't demote those values.

Prohibiting it from propogating through control flow would mean that a
select instruction wouldn't be equivalent to a conditional branch and
a phi. These are problems that aren't really important for hardware
ISAs, but are important for LLVM IR.

We could consider select an observation point and consider its use of
poisoned values undefined behavior.

                               -Dave

Dan Gohman <gohman@apple.com> writes:

In a world where signed add overflow returns undef, c could be any of
  0x0000000000000000
  0x0000000000000001
  0x0000000000000002
  ...
  0x000000007fffffff
or
  0xffffffff80000000
  0xffffffff80000001
  0xffffffff80000002
  ...
  0xffffffffffffffff
however it can't ever be 0x0000000080000000, because there's no 32-bit value
the undef could take which sign-extends into that 64-bit value.

Is undef considered to be "one true value" which is just unknown? I
always considered it to be "any possible value." If the latter, there
is no problem with undef "changing values" as a result of various
operations, including sign extension.

Therefore, if signed add overflow returns undef, promoting such 32-bit variables to
64-bit variables is not a behavior-preserving transformation.

Sure it is, if undef is allowed to "float freely" and change its value
on a whim. The behavior is, "this could be anything at any time."

If I'm understanding you correctly.

                                      -Dave

The current LLVM IR definition of undef is essentially "each time
undef is evaluated, it evaluates to an arbitrary value, but it must
evaluate to some specific value". SimplifyDemandedBits is an example
of a transformation which uses undef in a way that would be incorrect
under a model where undef is like the current trap value; take a look
at r133022 for a case where this matters.

-Eli

If reg2mem is constructing spill slots, or address-not-taken locals, I could
see classifying those as okay-to-poison. You would not want statics or globals
or pointer-addressed memory to be poisonable, as those should be treated
as being observable locations.

So, it’s not really “all of memory” that could be poisoned, only these
compiler-generated things.

Pogo

This would mean that it's not valid to do store-to-load propagation
in the optimizer if the pointee might be a global variable. That
would mean the loss of real optimizations.

And it would still mean that non-volatile store instructions suddenly
can have side effects beyond just writing to their addressed memory.
That would be surprising.

I think Dave also suggested making select instructions be
observation points. That would mean that select instructions would
have side effects. Again, that would be surprising.

Dan

Dan Gohman <gohman@apple.com> writes:

And it would still mean that non-volatile store instructions suddenly
can have side effects beyond just writing to their addressed memory.
That would be surprising.

No more surprising than any hardware architecture that implements this
kind of poison tracking. You see the fault or undefined behavior at the
store. Yes, a compiler IR is not an ISA but we do get to define the IR
semantics. It's a contract where we get to specify all of the rules.
:slight_smile:

If you're more worried about things like the above rather than the high
static analysis cost these various suggestions are trying to address,
that's simply a particular tradeoff you'd rather make. There's no rule
that says any particular semantic is "surprising."

We make the tradeoffs we want and implement the semantics based on that.

I think Dave also suggested making select instructions be
observation points. That would mean that select instructions would
have side effects. Again, that would be surprising.

How so? It certainly is an observable point. A select is just a branch
by another name. If the program behavior was undefined in the presence
of a branch, why shoudln't it be undefined in the presence of a select?

                          -Dave

(If this thread is becoming tiresome, let me know. This newbie is trying to
understand some of what’s going on; clearly you’ve thought about it way more
than I have, and I can understand if you want to stop thinking about it!)

Dan Gohman <gohman@apple.com> writes:

Prohibiting poison values from propogating through memory would mean
that the reg2mem pass would no longer be a semantics-preserving pass.

Or it means you couldn’t demote those values.

If reg2mem is constructing spill slots, or address-not-taken locals, I could
see classifying those as okay-to-poison. You would not want statics or globals
or pointer-addressed memory to be poisonable, as those should be treated
as being observable locations.

So, it’s not really “all of memory” that could be poisoned, only these
compiler-generated things.

This would mean that it’s not valid to do store-to-load propagation
in the optimizer if the pointee might be a global variable. That
would mean the loss of real optimizations.

And it would still mean that non-volatile store instructions suddenly
can have side effects beyond just writing to their addressed memory.
That would be surprising.

I think Dave also suggested making select instructions be
observation points. That would mean that select instructions would
have side effects. Again, that would be surprising.

Dan

Back in your first message on this thread, you did say:

The poison would
propagate down the def-use graph until it reaches an instruction that
is capable of triggering the undefined behavior.

So, somebody somewhere has to be capable of triggering the undefined
behavior. What instructions did you have in mind? If the store,
select, and compare instructions aren’t appropriate, which ones are?

I don’t think this next suggestion was on your list, so here goes:
Drawing from the Itanium example, we could define a new ‘check’
instruction, which would explicitly trigger the undefined behavior if its
operand is poisoned, or produce a known-safe value otherwise. Yes, the
IR would have to be sprinkled with these things; but only outputs of ‘nsw’
kinds of operations would subsequently need a ‘check’ to sanitize them.

So, if a potentially poison value propagates down the def-use graph
until it reaches a store/select/compare, that instruction must be
preceded by a ‘check’ that will sanitize the value.

(Or maybe we should call it the ‘foodtaster’ instruction, which protects
the important instructions from poison…)

Separating out the induce-undefined-behavior instruction means the
existing semantics of store/select/compare are unaffected, the main
consequences being that you can’t move these guys past the ‘check’
that feeds them. But that constraint would exist automatically anyway,
right? because you couldn’t move an instruction up prior to where its
input values are defined.

Clever/optimal placement of ‘check’ becomes a topic of interest, but
that’s kind of the point of the exercise.

Pogo

But since the add nsw returns undef, then isn't c "sext i32 undef to
i64", and thus also undef?

I read it like this example from LangRef:
      %B = undef
      %C = xor %B, %B
    Safe:
      %C = undef

%C is undef, so can be later (in a bitwise or, for example) assumed to
be -1, even though there's no possible choice of value for %B such
that %B xor %B gives -1.

Confused,
~ Scott

int a = INT_MAX, b = 1;
long c = (long)(a + b);

What is the value of c, on an LP64 target?

If a and b are promoted to 64-bit, c is 0x0000000080000000.

In a world where signed add overflow returns undef, c could be any of
0x0000000000000000
...
0xffffffffffffffff
however it can't ever be 0x0000000080000000, because there's no 32-bit value
the undef could take which sign-extends into that 64-bit value.

But since the add nsw returns undef, then isn't c "sext i32 undef to
i64"

Yes.

and thus also undef?

No: it's some value between INT32_MIN and INT32_MAX. Think of undef
as a read from an arbitrary register: you can't predict what it will
contain, but you know it contains some representable value.

-Eli

(If this thread is becoming tiresome, let me know. This newbie is trying to
understand some of what's going on; clearly you've thought about it way more
than I have, and I can understand if you want to stop thinking about it!)

>
> Dan Gohman <gohman@apple.com> writes:
>
> > Prohibiting poison values from propogating through memory would mean
> > that the reg2mem pass would no longer be a semantics-preserving pass.
>
> Or it means you couldn't demote those values.
>
> If reg2mem is constructing spill slots, or address-not-taken locals, I could
> see classifying those as okay-to-poison. You would not want statics or globals
> or pointer-addressed memory to be poisonable, as those should be treated
> as being observable locations.
>
> So, it's not really "all of memory" that could be poisoned, only these
> compiler-generated things.

This would mean that it's not valid to do store-to-load propagation
in the optimizer if the pointee might be a global variable. That
would mean the loss of real optimizations.

And it would still mean that non-volatile store instructions suddenly
can have side effects beyond just writing to their addressed memory.
That would be surprising.

I think Dave also suggested making select instructions be
observation points. That would mean that select instructions would
have side effects. Again, that would be surprising.

Dan

Back in your first message on this thread, you did say:

>The poison would
>propagate down the def-use graph until it reaches an instruction that
>is capable of triggering the undefined behavior.

So, somebody somewhere has to be capable of triggering the undefined
behavior. What instructions did you have in mind? If the store,
select, and compare instructions aren't appropriate, which ones are?

A call to printf (I/O), a volatile memory operation, etc. These things
can never be speculated. They are essentially the anchors of the system.

I don't think this next suggestion was on your list, so here goes:
Drawing from the Itanium example, we could define a new 'check'
instruction, which would explicitly trigger the undefined behavior if its
operand is poisoned, or produce a known-safe value otherwise. Yes, the
IR would have to be sprinkled with these things; but only outputs of 'nsw'
kinds of operations would subsequently need a 'check' to sanitize them.

So, if a potentially poison value propagates down the def-use graph
until it reaches a store/select/compare, that instruction must be
preceded by a 'check' that will sanitize the value.

(Or maybe we should call it the 'foodtaster' instruction, which protects
the important instructions from poison...)

Separating out the induce-undefined-behavior instruction means the
existing semantics of store/select/compare are unaffected, the main
consequences being that you can't move these guys past the 'check'
that feeds them. But that constraint would exist automatically anyway,
right? because you couldn't move an instruction up prior to where its
input values are defined.

Clever/optimal placement of 'check' becomes a topic of interest, but
that's kind of the point of the exercise.

For example, suppose we want to convert the && to &, and the ?: to a
select, in this code:

if (a && (b ? (c + d) : e)) {

because we have a CPU architecture with poor branch prediction, or
because we want more ILP, or because of some other reason. Here's what the
LLVM IR for that might look like:

   %t0 = add nsw i32 %c, %d
   %t1 = select i1 %b, i32 %t0, i32 %e
   %t2 = icmp ne i32 %t1, 0
   %t3 = and i1 %a, %t2
   br i1 %t3, ...

The extra branching is gone, yay. But now we've put an add nsw out there
to be executed unconditionally. If we make the select an observation
point, we'd have introduced undefined behavior on a path that didn't
previously have it.

A foodtaster instruction doesn't really solve this problem, because
we'd have to put it between the add and the select, and it would be
just as problematic.

The current definition of nsw is an attempt to avoid arbitrary
limitations and keep the system as flexible as possible.

One could argue that aggressive speculation is a low-level optimization
better suited to CodeGen than the optimizer, as LLVM divides them, and
that perhaps the cost for providing this level of flexibility in the
optimizer is too high, but that's a different argument.

Dan

int a = INT_MAX, b = 1;
long c = (long)(a + b);

What is the value of c, on an LP64 target?

If a and b are promoted to 64-bit, c is 0x0000000080000000.

In a world where signed add overflow returns undef, c could be any of
0x0000000000000000
...
0xffffffffffffffff
however it can't ever be 0x0000000080000000, because there's no 32-bit value
the undef could take which sign-extends into that 64-bit value.

But since the add nsw returns undef, then isn't c "sext i32 undef to
i64", and thus also undef?

No; sext i32 undef to i64 returns an extraordinary thing. It's a
64-bit value where the low 32 bits are undef, and the high
32 bits are sign extension from whatever bit 31 happened to be.
And the low 32 bits in this new value also fluctuate, and the
high 31 bits in this value fluctuate with them in concert with
bit 31 of the new value.

FWIW, you can also get non-bitwise fluctuation, such as (undef*3), which
is a value which fluctuates around multiples of 3. Or (undef*3)/5+1,
which fluctuates around, well, any value which could be computed by
that expression. You can build arbitrarily complex expression trees
around undef, and everything fluctuates according to the constraints of
operands and opcodes. It's quite magical.

I read it like this example from LangRef:
     %B = undef
     %C = xor %B, %B
   Safe:
     %C = undef

%C is undef, so can be later (in a bitwise or, for example) assumed to
be -1, even though there's no possible choice of value for %B such
that %B xor %B gives -1.

The reason this example works is that undef fluctuates, and further, it
causes things that use it to fluctuate (within their constraints, as I
discussed above). Since the xor reads %B twice, it could get two different
values, so %C can be an unconstrained undef.

Dan