It seems like a odd choice given that none of our current targets
are tagged architectures, all of this tagged IR has to somehow be
reduced back down to normal target machine instructions.
I agree with Krzysztof that an analogy with tagged architectures is strained. One way to see this is to look at the interpreter mode of lli, which is a proper LLVM interpreter that doesn't need to tag values as poison or not. It completely ignores poison afaik.
Correct me if I am wrong, but it seems no one has outlined a proof
that C programs can be translated into IR+”poison", optimized according
to the definition of “poison”, and always be guaranteed to still
be the same C program. Right now all we have are beliefs
based on intuition that this can be done, but no actual proof.
It isn't clear that producing such a proof is a good value proposition, since Clang isn't a major source of miscompilation bugs, as far as I know. The problems under discussion (e.g, in the freeze paper) are in the middle end, not the front end.
In other words, everyone believed that “undef” was the way to go until
counter examples started showing up, and now everyone believes
that “poison” is the way to go, which will be true until counter examples
start showing up. We are jumping from one shaky precipice to another,
we still are not yet on solid ground.
I disagree with this characterization. Allow me to propose a different one.
- In the beginning, there was no undef and no poison. This was a workable situation, a perfectly suitable basis for a correct compiler, but certain desirable optimizations could not be performed.
- Undef was added to allow the IR to express don't-care values, with the result that writing some optimizations got a bit harder but also more optimizations could be performed.
- Poison was added to allow the IR to express additional optimizations that cannot be justified by undef, with the result that writing some optimizations got a bit harder still, but again more optimizations could be performed. This is the current situation and there are three problems. (1) People weren't always sure which of poison or undef to use, this confusion persists and needs to get fixed. The common case (I think) is a comment or documentation saying undef where it should be poison. (2) Undef and poison interact in tricky ways. (3) Conflicting assumptions were sometimes made about how UB works, leading to problems such as ones described in the freeze paper.
But let's be clear: UB problems are in corner cases and they don't affect very many developers or users. The sense of the community (as far as I can tell) is that a fundamental overhaul isn't called for. Various proposals have been made, including ours, about how to fix the UB model, but none has yet been adopted.
Peter, you are obviously entitled to your views, but my take is that it probably isn't productive to start multiple UB-related threads in a short time period, nor is it productive to sort of throw stones at all aspects of the model. We can't debate everything at once and we're not going to throw out a huge base of optimizations built around the current model. I'll also repeat Daniel's request that you tone down the open-ended requests for proofs/counterexamples/whatever that would generate a lot of work for anyone trying to discuss these matters with you. Instead of producing abstract arguments, please provide concrete examples, ideally accompanied by code.
John