RFC: Proposal to Remove Poison

Hello,

I’d like to offer an alternative solution to the “poison problem”: remove it.

What follows is rather informal. I’d happily write up a nicer document if this RFC stands up to scrutiny.

The idea was born from two observations:

  • undef was introduced to model a load of uninitialized memory, a form of undefined behavior.
  • poison was introduced to model integer overflow, another form of undefined behavior.

I find it awkward that one form of uninitialized behavior is more or less powerful than another.

We rely on the following properties for undef:

  • undef is permitted to be an arbitrary bit-pattern.
  • Instructions may or may not be value dependent on their undef operands.

We rely on the following properties for poison:

  • Instructions which have poison operands are capable of providing results which are not consistent with a two’s complement operand.

Here is a case where we believed distinguishing between poison and undef is essential:
%add = add nsw i32 %a, %b
%cmp = icmp sgt i32 %add, %a

If %a is INT_MAX and %b is 1, then %add results in overflow.

If overflow results in undef, then %cmp would be equivalent to:
%cmp = icmp sgt i32 undef, INT_MIN

There is no bit-pattern we could substitute for undef which would make %cmp true, this means that we cannot optimize %cmp to ‘icmp sgt i32 %b, 0’.

Poison was created to allow us to produce whatever value we’d like for %cmp if %add resulted in overflow.

What I would like to do is to remove poison and replace it with undef while still optimizing comparison instructions. The quick-and-dirty way of stating this is: an icmp with an undef operand produces undef regardless of its other operand.

I believe this would allow us to exploit and optimize overflow in arithmetic flowing into comparison operations.

Could this be problematic?

Whether we intended to or not, LLVM’s implementation of undef’s semantics already allows us to say that a memory location is not equivalent to any possible bit-pattern.

Consider the following program:

define i1 @f() {
%mem = alloca i1
%load = load i1* %mem
%cmp = icmp eq i1 %load, false
ret i1 %cmp
}

Because it is possible for %load’s undef to be either true or false, %cmp must be equivalent to undef as well. This is completely consistent with the above rules.

The following program is a little more interesting:

define i1 @g() {
%mem = alloca i1
%load = load i1* %mem
%cmp0 = icmp eq i1 %load, 0
%cmp1 = icmp eq i1 %load, 1
%and = xor i1 %cmp0, %cmp1
ret i1 %and
}

The intent of this program is to compare a memory location against all possible values the location might have.

If we ran LLVM’s InstCombine pass then %and would have been replaced with the “expected” result of true.

If we instead ran SROA over @g, we would be left with:

%cmp0 = icmp eq i1 undef, false
%cmp1 = icmp eq i1 undef, true
%and = xor i1 %cmp0, %cmp1

Now that %cmp0 and %cmp1 have different undef values, %and is now undef.
This result is sufficient to say that the contents of %mem are neither true nor false!

From: "David Majnemer" <david.majnemer@gmail.com>
To: llvmdev@cs.uiuc.edu
Cc: "Nuno Lopes" <nuno.lopes@ist.utl.pt>, "John Regehr" <regehr@cs.utah.edu>
Sent: Sunday, February 8, 2015 4:23:15 AM
Subject: [LLVMdev] RFC: Proposal to Remove Poison

Hello,

I'd like to offer an alternative solution to the "poison problem":
remove it.

What follows is rather informal. I'd happily write up a nicer
document if this RFC stands up to scrutiny.

The idea was born from two observations:
- undef was introduced to model a load of uninitialized memory, a
form of undefined behavior.
- poison was introduced to model integer overflow, another form of
undefined behavior.

I find it awkward that one form of uninitialized behavior is more or
less powerful than another.

We rely on the following properties for undef:
- undef is permitted to be an arbitrary bit-pattern.
- Instructions may or may not be value dependent on their undef
operands.

We rely on the following properties for poison:
- Instructions which have poison operands are capable of providing
results which are not consistent with a two's complement operand.

Here is a case where we believed distinguishing between poison and
undef is essential:
%add = add nsw i32 %a, %b
%cmp = icmp sgt i32 %add, %a

If %a is INT_MAX and %b is 1, then %add results in overflow.

If overflow results in undef, then %cmp would be equivalent to:
%cmp = icmp sgt i32 undef, INT_MIN

There is no bit-pattern we could substitute for undef which would
make %cmp true, this means that we cannot optimize %cmp to 'icmp sgt
i32 %b, 0'.

Poison was created to allow us to produce whatever value we'd like
for %cmp if %add resulted in overflow.

What I would like to do is to remove poison and replace it with undef
while still optimizing comparison instructions. The quick-and-dirty
way of stating this is: an icmp with an undef operand produces undef
regardless of its other operand.

Currently, we might replace:
  %cmp = icmp sgt i32 undef, INT_MAX -> i1 false

According to this proposal, we could replace:
%cmp = icmp sgt i32 undef, INT_MAX -> i1 undef

While replacing it with false is still allowed, so is replacing it with true. I'm not sure this makes sense.

I believe this would allow us to exploit and optimize overflow in
arithmetic flowing into comparison operations.

Could this be problematic?

Whether we intended to or not, LLVM's implementation of undef's
semantics already allows us to say that a memory location is not
equivalent to any possible bit-pattern.

Consider the following program:

define i1 @f() {
%mem = alloca i1
%load = load i1* %mem
%cmp = icmp eq i1 %load, false
ret i1 %cmp
}

Because it is possible for %load's undef to be either true or false,
%cmp must be equivalent to undef as well. This is completely
consistent with the above rules.

The following program is a little more interesting:

define i1 @g() {
%mem = alloca i1
%load = load i1* %mem
%cmp0 = icmp eq i1 %load, 0
%cmp1 = icmp eq i1 %load, 1
%and = xor i1 %cmp0, %cmp1
ret i1 %and
}

The intent of this program is to compare a memory location against
all possible values the location might have.

If we ran LLVM's InstCombine pass then %and would have been replaced
with the "expected" result of true.

If we instead ran SROA over @g, we would be left with:

%cmp0 = icmp eq i1 undef, false
%cmp1 = icmp eq i1 undef, true
%and = xor i1 %cmp0, %cmp1

Now that %cmp0 and %cmp1 have different undef values, %and is now
undef.
This result is sufficient to say that the contents of %mem are
neither true nor false!

In the past, this has been explained to me in the following way: undef values have a definitive bit pattern per use, consistent with all relevant constraints, but may have a different pattern at each use. Thus, the reason that the SROA transformation is allowable is that the undef is 'used' twice, once in each comparison, and is otherwise unconstrained, and so make take on a different value (0 or 1 here) at each comparison.

That having been said, this is also somewhat loose, because IR-level operations have no atomicity. An i32 add could be replaced by i16 operations (that would 'use' the undef input multiple times).

-Hal

I think your example is a compelling argument for making icmp more powerful
and less obvious when you replace INT_MAX with an unknown variable %a:
%cmp = icmp sgt i32 undef, %a

We'd like to make %cmp undef without having to prove that %a is not
INT_MAX. I'm pretty sure instcombine does something like this today, and
this change would provide the basis for it.

When you say, "we'd like to", I'm assuming your justification for this is so that we can model poison using these undef semantics. Is that correct?

-Hal

In your example above, it seems to me that the replacement of all %load uses with “undef" is the reason we are losing the information that the compares operate on the same value. I’m curious about why isn’t undef attached to a value?
Wouldn’t it be desirable to be able to distinguish “undef" when they have to be the same value or not?

Thanks,

Yes, but I also think 'icmp sgt i32 undef, %a -> i1 undef' is a pretty
reasonable transform by itself. Well-defined programs should never observe
the result of the compare.

David, this sounds promising. One thing we should do is dig up some of Dan Gohman's old emails on this topic and make sure that your idea passes his litmus tests.

A few random things that want to be discussed in the detailed RFC:

- what constitutes separate reads of an undef? for example is "%1 = xor undef, undef" different from "%0 = undef ; %1 = xor %0, %0"?

- how do select and phi and br work in the value dependent and not-value-dependent cases?

- what is the list of "true UBs" that do not fall into your proposed model? for example, I assume divide by zero and OOB store

John

I'm somewhat concerned that this is unsound. For example, imagine I have the following:

void foo(int &a, int n) {
  int c = INT_MAX, r = 0;
  for (int i = 0; i < n; ++i, --c) {
    if (a > c)
      r += a;
    else
      a = bar();
  }

  return r;
}

called like this:

  int q;
  return foo(&q, 5);

However, we now have a situation where:
  1) in the first iteration the value loaded from &a is uninitialized
  2) after inlining the loop will likely be fully unrolled, and so the initial value loaded will be replaced by an undef
  3) it is the fact that, in that first iteration, (a > c, where c == INT_MAX) is always false that protects the undefined value from propagating further

I'm also assuming that it is desirable that 'int c = INT_MAX' should have the same behavior as 'int c = opaque_func_that_returns_INT_MAX()'.

To be clear, I'm not entirely convinced that this is a well-defined C++ program (*), however, I believe it is well defined under our current semantics at the IR level. What is being proposed would change that.

(*) 8.5p12 says only, "If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases..." [there are some special cases involving values of character type]. There is also a footnote on 3.9.1p6, discussing bool, that says, "Using a bool value in ways described by this International Standard as `undefined,` such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false." But it is not really clear to me whether (c > INT_MAX) is an evaluation having an indeterminate value (except via some circular reasoning).

-Hal

Hi David,

I have serious doubts about this approach. This is essentially how undef was originally implemented in LLVM, back in the mists of time. It led to a lot of bugs that Dan (and others) fixed in the 2008-2010 timeframe, culminating in the modern definition of undef, and poison as a complementary concept. Unfortunately, I don’t have pointers to those bugs, but I imagine some codebase archeology could turn them up.

—Owen

I have serious doubts about this approach. This is essentially how undef was originally implemented in LLVM, back in the mists of time. It led to a lot of bugs that Dan (and others) fixed in the 2008-2010 timeframe, culminating in the modern definition of undef, and poison as a complementary concept. Unfortunately, I don?t have pointers to those bugs, but I imagine some codebase archeology could turn them up.

We need a library of allowed / disallowed / disputed transformations, including the rationale for each. I'm happy to start working on this though my available bandwidth isn't too high right now. Does a github repo sound like the right thing here?

John

Hi David,

I have serious doubts about this approach. This is essentially how undef
was originally implemented in LLVM, back in the mists of time. It led to a
lot of bugs that Dan (and others) fixed in the 2008-2010 timeframe,
culminating in the modern definition of undef, and poison as a
complementary concept. Unfortunately, I don’t have pointers to those bugs,
but I imagine some codebase archeology could turn them up.

Having different notions of undef and poison are problematic as well.

This proposal is motivated from the observation that, in practice, we
transform:
define i1 @f(i32 %x, i32 %y, i1 %C) {
  %add = add nsw i32 %x, %y
  %sel = select i1 %C, i32 %add, i32 undef
  %cmp = icmp sgt i32 %sel, %x
  ret i1 %cmp
}

into:
define i1 @f(i32 %x, i32 %y, i1 %C) {
  %cmp = icmp sgt i32 %y, 0
  ret i1 %cmp
}

If undef isn't as strong as poison, this transformation is not valid.

Why is this the case? Consider that when %C is false, %sel will be undef.
This leaves us with:
  %cmp = icmp sgt i32 undef, %x

%x might be INT_MAX which means that %cmp must be optimized to false.

> From: "Reid Kleckner" <rnk@google.com>
> To: "Hal Finkel" <hfinkel@anl.gov>
> Cc: "David Majnemer" <david.majnemer@gmail.com>, "Nuno Lopes" <
nuno.lopes@ist.utl.pt>, "John Regehr"
> <regehr@cs.utah.edu>, "LLVM Developers Mailing List" <
llvmdev@cs.uiuc.edu>
> Sent: Sunday, February 8, 2015 2:20:49 PM
> Subject: Re: [LLVMdev] RFC: Proposal to Remove Poison
>
>
>
> When you say, "we'd like to", I'm assuming your justification for
> this is so that we can model poison using these undef semantics. Is
> that correct?
>
> Yes, but I also think 'icmp sgt i32 undef, %a -> i1 undef' is a
> pretty reasonable transform by itself. Well-defined programs should
> never observe the result of the compare.

I'm somewhat concerned that this is unsound. For example, imagine I have
the following:

void foo(int &a, int n) {
  int c = INT_MAX, r = 0;
  for (int i = 0; i < n; ++i, --c) {
    if (a > c)
      r += a;
    else
      a = bar();
  }

  return r;
}

called like this:

  int q;
  return foo(&q, 5);

However, we now have a situation where:
  1) in the first iteration the value loaded from &a is uninitialized
  2) after inlining the loop will likely be fully unrolled, and so the
initial value loaded will be replaced by an undef
  3) it is the fact that, in that first iteration, (a > c, where c ==
INT_MAX) is always false that protects the undefined value from propagating
further

I'm also assuming that it is desirable that 'int c = INT_MAX' should have
the same behavior as 'int c = opaque_func_that_returns_INT_MAX()'.

To be clear, I'm not entirely convinced that this is a well-defined C++
program (*), however, I believe it is well defined under our current
semantics at the IR level. What is being proposed would change that.

I don't think that program is reasonable. I also don't think my proposal
changes the facts on the ground.

consider:
bool f(int x, int y, bool c) {
  int add;
  if (c)
    add = x + y;
  return add > x;
}

Are we permitted to optimize this program to:
bool f(int x, int y, bool c) {
  return y > 0;
}

?

We do so today.

Having different notions of undef and poison are problematic as well.

This proposal is motivated from the observation that, in practice, we
transform:
define i1 @f(i32 %x, i32 %y, i1 %C) {
  %add = add nsw i32 %x, %y
  %sel = select i1 %C, i32 %add, i32 undef
  %cmp = icmp sgt i32 %sel, %x
  ret i1 %cmp
}

into:
define i1 @f(i32 %x, i32 %y, i1 %C) {
  %cmp = icmp sgt i32 %y, 0
  ret i1 %cmp
}

If undef isn't as strong as poison, this transformation is not valid.

I'd say concluding that the transform is not valid is the "right"
thing to do here. The way I'd unpack this transform is

%sel = select i1 %C, i32 %add, i32 undef =0=> %sel = select i1 %C,
i32 %add, i32 %add =1=> %sel = %add

=0=> is problematic since it potentially introduces a use of poison
where there wasn't any. In general, replacing undef with poison
should not be allowed since as you say, undef is strictly weaker than
poison.

Matthias proposed a poison-to-undef operator that could be used here
-- it is okay to replace %sel with `poison-to-undef %add` and it is
not okay to transform '(poison-to-undef (add nsw x 1)) sgt x` to true.

-- Sanjoy

I don't think that program is reasonable. I also don't think my proposal
changes the facts on the ground.

consider:
bool f(int x, int y, bool c) {
  int add;
  if (c)
    add = x + y;
  return add > x;
}

Are we permitted to optimize this program to:
bool f(int x, int y, bool c) {
  return y > 0;
}

?

We do so today.

I'd consider that a bug: f(INT_SMAX, 1, false) should be false IMO --
a phi of poison and undef should be undef, and "undef > INT_SMAX"
should be false.

-- Sanjoy

If we instead ran SROA over @g, we would be left with:
  %cmp0 = icmp eq i1 undef, false
  %cmp1 = icmp eq i1 undef, true
  %and = xor i1 %cmp0, %cmp1

I think it has to be that way, since otherwise the two programs:

%x = load %M
%y = load %M
use(%x, %y)

and

%x = load %M
use(%x, %x)

would not be equivalent and we would not be able to do load commoning
(I'm somewhat speculating here -- I'm not actually sure if the two
programs above are equivalent). And those load instructions are
independently undef. In other words, the "problem" here is that the
two programs above are supposed to be equivalent *and* that a load
from an uninitialized location is undef.

I agree with Hal that the right way to assign semantics to undef is
that say that it is some N bit value at each *use*; and the N bit
value undef pretends to be at each use may be a different one. I
think this also justifies existence of undef -- if uninitialized loads
were supposed to be just some value consistently, then you could get
the same semantics by saying that uninitialized locations contain
"some arbitrary N bit value". Stating that loads from uninitialized
values are undef is strictly stronger, as your example shows.

-- Sanjoy

I agree with Hal that the right way to assign semantics to undef is
that say that it is some N bit value at each *use*; and the N bit
value undef pretends to be at each use may be a different one. I

I take back what I said -- deciding what value undef will correspond
to at each use is fairly counter-intuitive. For instance, this means
inlining is not meaning-preserving:

declare i32 @use(i32, i32)

define i32 @f(i32 %x) alwaysinline {
entry:
  %r = call i32 @use(i32 %x, i32 %x)
  ret i32 %r
}

define i32 @g() {
entry:
  %r = call i32 @f(i32 undef)
  ret i32 %r
}

on `opt -always-inline` is transformed to

declare i32 @use(i32, i32)

; Function Attrs: alwaysinline
define i32 @f(i32 %x) #0 {
entry:
  %r = call i32 @use(i32 %x, i32 %x)
  ret i32 %r
}

define i32 @g() {
entry:
  %r.i = call i32 @use(i32 undef, i32 undef)
  ret i32 %r.i
}

attributes #0 = { alwaysinline }

If we say "undef chooses any value at each use", the first program
always calls @use with both parameters equal (so if @use was printing
the xor of its two parameters, it would always print 0) while the
second program calls @use with two arbitrary values.

Another example related to, but slightly different from David's
motivating example:

define i32 @s(i32* %p) {
entry:
  store i32 undef, i32* %p
  %r = load i32* %p
  %m = call i32 @use(i32 %r, i32 %r)
  ret i32 %r
}

opt -O3 transforms this to

define i32 @s(i32* nocapture readnone %p) {
entry:
  %m = tail call i32 @use(i32 undef, i32 undef)
  ret i32 undef
}

This example has the same problem as the first one -- the value of i32
undef was supposed to have been "decided" at its only use, the store;
but instead LLVM has increased the program's degree of freedom (I'm
using the term informally).

As far as I can tell, there are two ways to fix this:

1. teach all of LLVM to treat undef specially. This may be difficult
because unlike every other SSA value, undef's value is decided at
uses, not defs. For instance, a "fixed" version of the inliner pass
could coerce the "i32 undef" to, say, "i32 0" before inlining the body
of the callee.

2. represent undef as an instruction, not as a value. Then the first
example becomes

define i32 @f(i32 %x) alwaysinline {
entry:
  %r = call i32 @use(i32 %x, i32 %x)
  ret i32 %r
}

define i32 @g() {
entry:
  %u = undef
  %r = call i32 @f(i32 %u)
  ret i32 %r
}

which opt transforms to

define i32 @g() {
entry:
  %u = undef
  %r.i = call i32 @use(i32 %u, i32 %u)
  ret i32 %r.i
}

I do not want to derail this thread by taking off on a tangent, but if
there is interest in (2) [or (1), but I think (1) is too hard to do
correctly] I can write up a proposal and start a separate thread on
llvmdev.

-- Sanjoy

> From: "Reid Kleckner" <rnk@google.com>
> To: "Hal Finkel" <hfinkel@anl.gov>
> Cc: "David Majnemer" <david.majnemer@gmail.com>, "Nuno Lopes" <
nuno.lopes@ist.utl.pt>, "John Regehr"
> <regehr@cs.utah.edu>, "LLVM Developers Mailing List" <
llvmdev@cs.uiuc.edu>
> Sent: Sunday, February 8, 2015 2:20:49 PM
> Subject: Re: [LLVMdev] RFC: Proposal to Remove Poison
>
>
>
> When you say, "we'd like to", I'm assuming your justification for
> this is so that we can model poison using these undef semantics. Is
> that correct?
>
> Yes, but I also think 'icmp sgt i32 undef, %a -> i1 undef' is a
> pretty reasonable transform by itself. Well-defined programs should
> never observe the result of the compare.

I'm somewhat concerned that this is unsound. For example, imagine I have
the following:

void foo(int &a, int n) {
  int c = INT_MAX, r = 0;
  for (int i = 0; i < n; ++i, --c) {
    if (a > c)
      r += a;
    else
      a = bar();
  }

  return r;
}

called like this:

  int q;
  return foo(&q, 5);

However, we now have a situation where:
  1) in the first iteration the value loaded from &a is uninitialized
  2) after inlining the loop will likely be fully unrolled, and so the
initial value loaded will be replaced by an undef
  3) it is the fact that, in that first iteration, (a > c, where c ==
INT_MAX) is always false that protects the undefined value from propagating
further

I'm also assuming that it is desirable that 'int c = INT_MAX' should have
the same behavior as 'int c = opaque_func_that_returns_INT_MAX()'.

To be clear, I'm not entirely convinced that this is a well-defined C++
program (*), however, I believe it is well defined under our current
semantics at the IR level. What is being proposed would change that.

OK, this isn't a well formed C program:

C11 6.7.9p10:
If an object that has automatic storage duration is not initialized
explicitly, its value is indeterminate.

C11's annex on undefined behavior, J has:
The value of an object with automatic storage duration is used while it is
indeterminate.

The following program is a little more interesting:
define i1 @g() {
  %mem = alloca i1
  %load = load i1* %mem
  %cmp0 = icmp eq i1 %load, 0
  %cmp1 = icmp eq i1 %load, 1
  %and = xor i1 %cmp0, %cmp1
  ret i1 %and
}

The intent of this program is to compare a memory location against all possible values the location might have.

If we ran LLVM's InstCombine pass then %and would have been replaced with the "expected" result of true.

If we instead ran SROA over @g, we would be left with:
  %cmp0 = icmp eq i1 undef, false
  %cmp1 = icmp eq i1 undef, true
  %and = xor i1 %cmp0, %cmp1

This transformation is invalid in the light of current LLVM's semantics. Although it's fine according to the C standard (it's undefined behavior), it's highly non-intuitive. (now I recall why 'xor %a, %a' could be different than 0).
Sanjoy proposed in another email to make undef an instruction. That would actually solve this problem. SROA could replace the load with an undef instruction, forcing %and to be true. Undef instructions could be determinized to 0 in codegen (if not removed beforehand).

The downsize of having just one type of undef is that then all undefs are the same in the following sense:
%mem = alloca
%load = load %mem
%cmp = icmp slt %load, INT_MIN

With current LLVM, %cmp will always be false. With your proposal, %cmp will be undef, and therefore potentially true, which is strange.

Nuno

Sanjoy proposed in another email to make undef an instruction. That would actually solve this problem. SROA could replace the load with an undef instruction, forcing %and to be true. Undef instructions could be determinized to 0 in codegen (if not removed beforehand).

Undef-as-an-instruction has a problem when combined with "loads from
uninitialized memory are undef". If by "loads from uninitialized
memory are undef" we mean the old "undef" then we have the same
problem of "xor %x, %x" not always being zero. And if by "loads from
uninitialized memory are undef" we mean "the load is equivalent to an
gen_undef instruction" then we cannot duplicate loads unless we know
that the load is from initialized memory since

%x = load P
%y = xor %x, %x

is not the same as

%x0 = load P
%x1 = load P
%y = xor %x0, %x1

for an uninitialized *P

-- Sanjoy

Sanjoy proposed in another email to make undef an instruction. That would actually solve this problem. SROA could replace the load with an undef instruction, forcing %and to be true. Undef instructions could be determinized to 0 in codegen (if not removed beforehand).

Undef-as-an-instruction has a problem when combined with "loads from uninitialized memory are undef". If by "loads from uninitialized memory are undef" we mean the old "undef" then we have the same problem of "xor %x, %x" not always being zero. > And if by "loads from uninitialized memory are undef" we mean "the load is equivalent to an gen_undef instruction" then we cannot duplicate loads unless we know that the load is from initialized memory since

%x = load P
%y = xor %x, %x

is not the same as

%x0 = load P
%x1 = load P
%y = xor %x0, %x1

for an uninitialized *P

Aah, right. This stuff is mind blowing :slight_smile:
Nuno