Change undef to poison in a few operations


Lately we have come to realize how undef makes our life complicated..
Therefore in this email we propose to change the behavior of a few
instruction to yield poison instead of undef in error cases. This follows
the suggestion of Eli in ⚙ D33654 [docs] Make it clear shifts yield poison when shift amount >= bitwidth.

Why is undef so bad?
- I believe it's not possible to make newgvn correct with undef. See for
example the discussion here:
- A bunch of optimizations are correct with poison but not with undef (see
John's blog posts, my talk at LLVM conf last year, our recent paper on PLDI,

This proposal is not very radical; it's just to change a few error behaviors
to poison and shouldn't have any practical effect for the time being. (so
the change is documentation only)
Later we can continue the work until we have completely removed undef. Even
if we don't go there, it's generally a good idea to reduce the usage of
undef and replace it with posion whenever possible.

So we propose to change the following from undef to poison:
- insertelement/extractelement (element index out of range)
- shufflevector (undef elements in shuffle mask)
- alloca (allocating zero bytes) ?
- load (load value with different type from written value)
- GEP inrange
- fptrunc/fptoui/fptosi/uitofp/sitofp (overflow) ?
- llvm.ctlz.* and friends (is_zero_undef) ?

Some of these are no brainers, others may need a bit of discussion.

Please let us know if you have thoughts and/or concerns about this.


I don’t understand the long-term practical implications of this. One of these operations is kind of important to me so I’ll ask about it specifically.

The shufflevector with undef elements in the shuffle mask seems to me like quite a useful thing. We certainly make use of it in the PPC back end. It seems reasonable to me that if I want the result to have a few elements of the input vectors in specific element indices, that those should have well defined values. For those that I don’t care about, the back end should be allowed to put any element there (and decision for which should be up to the back end - i.e. which shuffle can be handled efficiently). On PPC, we have a number of instructions that can perform a shuffle with a single instruction - others are handled with a constant-pool load and a generic shuffle instruction. We detect masks we can handle and utilize undef as a “this certainly matches” type of value for any index.

So my question is whether any of this would change and if so, how it would change.


Pretty much nothing would change for you. Basically for shufflevector, where in the manual you read undef, we would change it to poison. Vector elements that are poison don't taint the whole vector; it's per element (like undef).
In your backend you're still free to pick whatever value you want for poison, so nothing changes.

The proposal below show be nearly NFC for the vast majority of use cases. The goal is to reduce undef usage over time. Eventually we would like to introduce a poison value in the IR (right now it only exists implicitly), and at that point you would need to broaden your check from isUndef to (isUndef || isPoison).