poison and select

In the section about poison values, the LLVM language reference manual says:

   "Values other than phi nodes depend on their operands."

This implies that a select instruction's output can be poisoned by its not-selected argument value. If select were poisoned only by its selected argument, we would expect this fact to be mentioned specifically, as it is for phi.

Next I'll show how poisoning by the not-selected value can introduce a problem. Take this C program:

int printf(const char *, ...);
int foo(int x0) {
   int x3 = x0 >> 27;
   int x4 = x3 - 27;
   int x2 = x4 ? x4 : (1 >> x4);
   int x1 = x2 != 0;
   return x1;
}
int arg;
int main (void) {
   int x1 = foo(arg);
   printf("%x\n", x1);
   return 0;
}

This program has no UB and it should print "1'.

Clang at -O2 turns foo() into this:

define i32 @foo(i32 %x0) #0 {
entry:
   %shr = ashr i32 %x0, 27
   %sub = add nsw i32 %shr, -27
   %tobool = icmp ne i32 %sub, 0
   %shr1 = lshr i32 1, %sub
   %cond = select i1 %tobool, i32 %sub, i32 %shr1
   %cmp = icmp ne i32 %cond, 0
   %conv = zext i1 %cmp to i32
   ret i32 %conv
}

Although the translation is very straightforward, there's a problem: %shr1 is a poison value and although it is not selected, it propagates through the rest of the function and poisons the return value.

Obviously things are not going well if the source program contains no UB and the emitted code returns a poison value. Either there has been a miscompilation or else the LLVM documentation should be changed to make it clear that the select instruction only has a value dependence on the value that is selected.

The immediate problem here is that both Souper and Alive are happy for foo() to return 0 instead of 1. Souper actually does this optimization -- legally, of course, according to the reference manual.

Anyway this issue seemed tricky enough to bring up here instead of just filing a PR.

Thanks,

John

In the section about poison values, the LLVM language reference manual
says:

  "Values other than phi nodes depend on their operands."

This implies that a select instruction's output can be poisoned by its
not-selected argument value. If select were poisoned only by its selected
argument, we would expect this fact to be mentioned specifically, as it is
for phi.

Next I'll show how poisoning by the not-selected value can introduce a
problem. Take this C program:

int printf(const char *, ...);
int foo(int x0) {
  int x3 = x0 >> 27;
  int x4 = x3 - 27;
  int x2 = x4 ? x4 : (1 >> x4);
  int x1 = x2 != 0;
  return x1;
}
int arg;
int main (void) {
  int x1 = foo(arg);
  printf("%x\n", x1);
  return 0;
}

This program has no UB and it should print "1'.

Clang at -O2 turns foo() into this:

define i32 @foo(i32 %x0) #0 {
entry:
  %shr = ashr i32 %x0, 27
  %sub = add nsw i32 %shr, -27
  %tobool = icmp ne i32 %sub, 0
  %shr1 = lshr i32 1, %sub
  %cond = select i1 %tobool, i32 %sub, i32 %shr1
  %cmp = icmp ne i32 %cond, 0
  %conv = zext i1 %cmp to i32
  ret i32 %conv
}

Although the translation is very straightforward, there's a problem: %shr1
is a poison value and although it is not selected, it propagates through
the rest of the function and poisons the return value.

Obviously things are not going well if the source program contains no UB
and the emitted code returns a poison value. Either there has been a
miscompilation or else the LLVM documentation should be changed to make it
clear that the select instruction only has a value dependence on the value
that is selected.

While the documentation may not match the facts on the
ground, InstructionSimplify does (in my opinion) the right thing; an undef
operand doesn't mean the entire SelectInst folds away to undef:
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/InstructionSimplify.cpp?revision=217342&view=markup#l2880

While the documentation may not match the facts on the ground, InstructionSimplify does (in my opinion) the
right thing; an undef operand doesn't mean the entire SelectInst folds away to undef:
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/InstructionSimplify.cpp?revision=217342&view=m
arkup#l2880

It seems pretty clear that this is the right semantics for select.

If we believe the documentation, then select is only useful when we can prove that it won't introduce a spurious poison value. This will not often be the case.

I'll go file a bug against the LLVM instruction reference to remind someone to tweak the doc. My proposed fix would be:

* Values other than phi nodes and select instructions depend on their operands.

* Phi nodes depend on the operand corresponding to their dynamic predecessor basic block.

* Select instructions depend on their selected operand.

John

http://llvm.org/bugs/show_bug.cgi?id=20895

Original inventor of the poison concept here.

Poison is a flawed concept. I proved it was flawed back in 2011 [0]

[0] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/046368.html

And since then, other holes have been found. I’m not aware of any serious proposals for its replacement that do not have similar problems.

Also note that poison is very different from undef. Undef, as far as I know, remains an entirely coherent concept.

Dan

Poison is a flawed concept. I proved it was flawed back in 2011 [0]

Nice. My colleagues and I will dig through this material and possibly come back with some ideas. We're going to need some sort of semantics for UB in LLVM since otherwise these formal-methods-based tools like Souper and Alive risk not making sense.

Thanks,

John

Hi John,

Hi John,

More background is in this post (at which time “poison” was still named “trap”) (and I apologize for the verbosity) [0]

[0] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/045730.html

The poison concept aimed to

(a) enable the sign-extension elimination optimization

(b) permit arbitrary speculation that doesn’t destroy any optimization opportunities

Undef does (b) but not (a). Immediate UB does (a) but not (b). Additionally, here are some interesting debatable requirements:

  • Select(a, b, c) should be semantically equivalent to (sext(a) & b)|(~sext(a) & c) (with appropriate bitcasting).

  • Select should be semantically equivalent to its intuitive translation as a branch and a phi (the current poison concept violates this)

  • Branch should be semantically equivalent to its intuitive translation as a switch, and as an indirect branch

  • Constant folding, mem2reg, and reg2mem should always be safe and optimization-opportunity-preserving

I personally suggest a compromise on the optimization, and instead a focus on helping programmers write better code. C++11 range-based for loops are easier for humans to read, and in some cases avoid ol’ “int” to step through arrays. Quoting the Zen of Python, “let’s do more of those!”

Today I ran into another aspect of the poison problem...

Basically, SimplifyCFG wants to take

   expr1 && expr2

and flatten it into

   x = expr1
   y = expr2
   x&y

This isn't safe when expr2 might execute UB. The consequence is that no LLVM shift instruction is safe to speculatively execute, nor is any nsw/nuw/exact variant, unless the operands can be proven to be in bounds.

Real example here:

   http://llvm.org/bugs/show_bug.cgi?id=20997

John

I believe Dan struck at the heart of the issue and the consequences have just become apparent (at least to me).

Consider:

define i32 @f(i32 %a) {
%cmp = icmp sgt i32 %a, 0
%sub = sub nsw i32 2147483647, %a
%sel = select i1 %cmp, i32 %sub, i32 0
ret i32 %sel
}

If %a is -1, %sub will wrap and be poison. %cmp will be false.
Is %sel poison or not? We have no definition for what select actually does.

If we assume select boils down to nothing but arithmetic, then %sel may be poisoned by any of its operands.

If we assume select boils down to control flow (branch + PHI), then we are violating the rules by which it operates in many places in LLVM.
Consider the following input:

%add = add nsw i32 %a, 1
%cmp1 = icmp eq i32 %a, 0
%cmp2 = icmp slt i32 %add, 0
%sel = select i1 %cmp1, i1 %cmp2, i1 false

We currently optimize this to:

%add = add nsw i32 %a, 1
%cmp1 = icmp eq i32 %a, 0
%cmp2 = icmp slt i32 %add, 0
%and = and i1 %cmp1, %cmp2

If %a is 2147483647: %add is poison, %cmp1 is false, %cmp2 is poison, %sel is false but %and is poison!

The above transform is not an oddity, InstCombine does this sort of thing a lot.

If select is arithmetic, many transforms are “correct” but we can form select far less often than we thought we could.
If select is control flow, many transforms are “wrong” but we might be able to form more selects.

What is select? How does it work? How should it work?

Hi all,

I was initially going to send this email to a thread on llvm-commits
where David explained the issue with poison and select to me, but then
some quick googling led me to this thread.

I've been thinking about a possible semantics for poison values in a
certain way:

The key semantic difference between undef and poison, as I understand
it, is that when justifying an execution trace the compiler needs to
commit to a legal value for every undef flowing through the program --
a N bit undef can be any N bit value, but it has to be *some* N bit
value. On the other hand, the compiler does not need to commit to a N
bit poison value being *any* legal N bit value -- it may do
optimizations (eg. (t < (t + 1)) ==> true) that cannot be justified by
any assignment of legal values to the poison flowing through the
program. Since there may not be legal assignments to poison values
that justify a certain execution trace, we have to prohibit any action
that leads to "observing" poison values. So we say "observing" poison
values is UB and "cannot happen". Now the question is "what is
observation?".

One way to formalize the above: make any side-effecting operation
classically data-dependent on a poison value undefined behavior. I
think this is what the spec tries to do; but the current approach has
issues as pointed out in this thread, especially in relation to
selects.

Since UB is a dynamic property, I think we should make this notion of
dependence dynamic too: a side effecting operation has UB if any of
its operands (including values it is control dependent on via
branches) is `dynamic-dependent` on a poison value (non-side effecting
operations like add and lshr never have UB).

A value X is *not* `dynamic-dependent` on value Y (not necessarily an
operand of X) if for all values of Y, X computes the same value. So
(X - X) is not `dynamic-dependent` on X, 42 is not dynamic-dependent
on 43. Otherwise X is `dynamic-dependent` on Y. The idea here is
that if an expression computes the same value for all possible values
of one of its inputs (transitively), the value computed by the
expression cannot be used to "observe" the value of the said operand;
even if there is a static dependence chain.

It does not make a whole lot of sense to talk about dynamic dependence
for instructions like "load" which are not pure computation [1], and
these should be classified under "side effecting operations". We will
have to carefully specify the `dynamic-dependent` relation for this
scheme to work (if it does work!), but I'm just trying to get the
general across right now. For reg2mem to be safe, for instance, we
need to special-case stores to and loads from non-escaping allocas
(see summary below).

Now, the interesting thing about this dynamic dependence thing is that
it is a runtime, exact, property of a program. So "select %c, %a, %b"
is dynamic-dependent on "%a" only in executions where "%c" is true.
In the example in previous email, for example,

  %add = add nsw i32 %a, 1
  %cmp1 = icmp eq i32 %a, 0
  %cmp2 = icmp slt i32 %add, 0
  %and = and i1 %cmp1, %cmp2

%and is *not* `dynamic-dependent` on %cmp2 in the runs in which %cmp1
is false.

Another more complex example, involving conversion of br-PHI to a
select:

  %a = maybe poison
  br i1 %a, label %left, label %right
left:
  %l = 42
  br label %merge
right:
  %r = < unknown load > can be speculated
  br label %merge
merge:
  %m = phi %l %r
  store %m to heap

If %c is poison, then the store has UB iff %r != %l (unless it is
meaningful to speculatively talk about %r's value, we can't form a
select anyway). In that case, it is okay to transform the br-phi to a
select. If %c is poison, and %r == %l, then the store isn't UB (since
the phi does not dynamically depend on %c). In that case, the select
we form, "select %c, %l, %r" doesn't observe %c either and so storing
it is not UB. That said, I admit this is counter-intuitive -- whether
a program is undefined or not is a function of state of the system
that the program may not even observe in the undefined run. If
this is a fundamental flaw in this scheme or not I'm not sure.

* Summary / TL;DR *

The compiler may justify executions using assignments to
poison values that don't corresponding to any legal value. For this
reason, escaping any *information* based off a poison value is UB --
such information is unsound, in some sense. This notion of
"information dependence" is more precise than classic data dependence
(e.g. (xor A A) does not give us any information about A). The notion
of escaping is also more precise than "side-effecting operation" --
storing to a non-escaping alloca does not escape the stored value, for
example.

Does this make any sense at all? Have there been past attempts at
formalizing poison along these lines?

  [1]: in this scheme load cannot result in poison because the store
that stored poison is UB and "could not have happened".

Thank you for your time!
-- Sanjoy

Hi,

Does this make any sense at all? Have there been past attempts at
formalizing poison along these lines?

Answering my own question, what I was considering is a slight variant
of the scheme mentioned in an earlier email by Dan:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/045730.html

"
- Instead of trying to define dependence in LangRef, just say that if
   changing the value returned from an overflowing add nsw would
   affect the observable behavior of the program, then the behavior of
   the program is undefined. This would reduce the amount of text in
   LangRef, but it would be a weaker definition, and it would require
   sign-extension optimizers and others to do significantly more work
   to establish that their transformations are safe.
"

That said, this scheme does not work as expected -- the bitness of the
result of an overflowing add prevents some optimizations:

%a = add nsw i32 %v, 1
%c = icmp slt i32 %v, %a
store %c, < global variable >

In the case of signed overflow, %v is INT32_MAX. For all values of
%a, the observable behavior of the program is the same (we store false
to < global variable > for all values of %a); and so we cannot exploit
undefined behavior and simplify '%c' to true.

Similar issues arise if you, for example, zero extend a poison value,
and then right shift out all the "poison bits" -- the resulting value
is all zeroes and do not capture overflowing semantics.

Thanks,
-- Sanjoy

I think that there are fundamentally two different solutions to the select-poison issue with their own pros and cons:

Either:

We allow instructions to not care about poison operands because their output will not contain any information tainted by the poison. This would mean that select is allowed to ignore a poisoned operand if it doesn’t select it. It would also mean that a poison condition value guarantees that you get one of the two operands but that the select instruction itself has been poisoned.

AFAICT, this doesn’t lose out on any optimizations.

Or:

We introduce a ‘vile’ attribute to the select instruction. A ‘vile’ select is poison even if the operand which wasn’t selected if poison while ‘normal’ selects behave more like br/phi instructions.

This is a more conservative and, IMO, uglier approach which is sound but would require auditing LLVM to make sure that it only transforms ‘vile’ selects into arithmetic. I think it would be very hard to optimize a normal select into a ‘vile’ select which would mean that we might lose out on some optimizations.

If there is a reason we shouldn’t go with approach #1? I can’t immediately come up with IR that we would stop optimizing.

I think the simpler way of phrasing 1 is: “poison has the same propagation rules as undef”. This means that poison is a property of bits, not of values, and basic arithmetic like ‘and 0’, ‘or 1’, and ‘mul 0’ can mask it away. This makes a CFG diamond equivalent to a select and select equivalent to arithmetic, which seems like general goodness. I wonder if we lose optimization power around these kinds of masking arithmetic instructions, though.