Hi all,

I was initially going to send this email to a thread on llvm-commits

where David explained the issue with poison and select to me, but then

some quick googling led me to this thread.

I've been thinking about a possible semantics for poison values in a

certain way:

The key semantic difference between undef and poison, as I understand

it, is that when justifying an execution trace the compiler needs to

commit to a legal value for every undef flowing through the program --

a N bit undef can be any N bit value, but it has to be *some* N bit

value. On the other hand, the compiler does not need to commit to a N

bit poison value being *any* legal N bit value -- it may do

optimizations (eg. (t < (t + 1)) ==> true) that cannot be justified by

any assignment of legal values to the poison flowing through the

program. Since there may not be legal assignments to poison values

that justify a certain execution trace, we have to prohibit any action

that leads to "observing" poison values. So we say "observing" poison

values is UB and "cannot happen". Now the question is "what is

observation?".

One way to formalize the above: make any side-effecting operation

classically data-dependent on a poison value undefined behavior. I

think this is what the spec tries to do; but the current approach has

issues as pointed out in this thread, especially in relation to

selects.

Since UB is a dynamic property, I think we should make this notion of

dependence dynamic too: a side effecting operation has UB if any of

its operands (including values it is control dependent on via

branches) is `dynamic-dependent` on a poison value (non-side effecting

operations like add and lshr never have UB).

A value X is *not* `dynamic-dependent` on value Y (not necessarily an

operand of X) if for all values of Y, X computes the same value. So

(X - X) is not `dynamic-dependent` on X, 42 is not dynamic-dependent

on 43. Otherwise X is `dynamic-dependent` on Y. The idea here is

that if an expression computes the same value for all possible values

of one of its inputs (transitively), the value computed by the

expression cannot be used to "observe" the value of the said operand;

even if there is a static dependence chain.

It does not make a whole lot of sense to talk about dynamic dependence

for instructions like "load" which are not pure computation [1], and

these should be classified under "side effecting operations". We will

have to carefully specify the `dynamic-dependent` relation for this

scheme to work (if it does work!), but I'm just trying to get the

general across right now. For reg2mem to be safe, for instance, we

need to special-case stores to and loads from non-escaping allocas

(see summary below).

Now, the interesting thing about this dynamic dependence thing is that

it is a runtime, exact, property of a program. So "select %c, %a, %b"

is dynamic-dependent on "%a" only in executions where "%c" is true.

In the example in previous email, for example,

%add = add nsw i32 %a, 1

%cmp1 = icmp eq i32 %a, 0

%cmp2 = icmp slt i32 %add, 0

%and = and i1 %cmp1, %cmp2

%and is *not* `dynamic-dependent` on %cmp2 in the runs in which %cmp1

is false.

Another more complex example, involving conversion of br-PHI to a

select:

%a = maybe poison

br i1 %a, label %left, label %right

left:

%l = 42

br label %merge

right:

%r = < unknown load > can be speculated

br label %merge

merge:

%m = phi %l %r

store %m to heap

If %c is poison, then the store has UB iff %r != %l (unless it is

meaningful to speculatively talk about %r's value, we can't form a

select anyway). In that case, it is okay to transform the br-phi to a

select. If %c is poison, and %r == %l, then the store isn't UB (since

the phi does not dynamically depend on %c). In that case, the select

we form, "select %c, %l, %r" doesn't observe %c either and so storing

it is not UB. That said, I admit this is counter-intuitive -- whether

a program is undefined or not is a function of state of the system

that the program may not even observe in the undefined run. If

this is a fundamental flaw in this scheme or not I'm not sure.

* Summary / TL;DR *

The compiler may justify executions using assignments to

poison values that don't corresponding to any legal value. For this

reason, escaping any *information* based off a poison value is UB --

such information is unsound, in some sense. This notion of

"information dependence" is more precise than classic data dependence

(e.g. (xor A A) does not give us any information about A). The notion

of escaping is also more precise than "side-effecting operation" --

storing to a non-escaping alloca does not escape the stored value, for

example.

Does this make any sense at all? Have there been past attempts at

formalizing poison along these lines?

[1]: in this scheme load cannot result in poison because the store

that stored poison is UB and "could not have happened".

Thank you for your time!

-- Sanjoy