The undef story

The quoted text above is indicative of a serious misunderstanding and I would like to stop it from leading anyone else astray.

The error is in thinking that we should consider the intent of a developer when we decide which optimizations to perform. That isn't how this works. LLVM code has a mathematical meaning: it describes computations. Any transformation that we do is either mathematically correct or it isn't.

A transformation is correct when it refines the meaning of a piece of IR. Refinement mostly means "preserves equivalence" but not quite because it also allows undefined behaviors to be removed. For example "add nsw" is not equivalent to "add" but an "add nsw" can always be turned into an "add". The opposite transformation is only permissible when the add can be proven to not overflow.

This is like the laws of physics for compiler optimizations, it is not open to debate.

The place to consider developer intent, if one wanted to do that, is in the frontend that generates IR. If we don't want undef or poison to ever happen, then we must make the frontend generate IR that includes appropriate checks in front of operations that are sometimes undefined. To do this we have sanitizers and safe programming languages.

SUMMARY: The intent, whatever it is, must be translated into IR. The LLVM middle end and backends are then obligated to preserve that meaning. They generally do this extremely well. But they are not, and must not be, obligated to infer the mental state of the developer who wrote the code that is being translated.

John