Support of trapping math

Reopened Codegen illegally reorders division by zero before function call · Issue #2535 · llvm/llvm-project · GitHub for the integer divide issue. (I was aware of the underlying issue, but I didn’t know of any way to actually reproduce it.)

I suspect that gcc hasn’t finished their implementation of -ftrapping-math, and since their implementation is unsafe by default the result is uncertainty about what the option is supposed to do as we’ve seen in this thread.

I did find a bugzilla report that has been discussing this for eleven years now without concluding.

I’m glad LLVM started with a safe-by-default implementation with predictable behavior.

This convincing example demonstrates that enabled traps in the absence of strict exception handling can result in behavior different from the execution of abstract C machine. That’s true.

However the same consideration demonstrates that vectorization is incompatible with strict exception handling. For example, if operations over different lanes raise different exceptions, only one trap handler is executed, and the program behavior would differ from the abstract C machine. Without vectorization and with strong operation ordering the poor performance of strict mode:

is not a deficiency of the current code generator, but it is a price for behavior identical to the abstract machine. For some users such performance drop could be inacceptable. If they turn on traps just to catch errors in runtime, they would expect little or no performance drop. But, as the example above demonstrates, if code compiled without strict exceptions could trigger traps that were absent in fully conformant program.

If solution for all use cases does not exist, users could be given a choice between the fully conformant compilation but with poor performance and the performant solution with reduced predictability. Documentation for -ftrapping-math could explicitly state that “if specified without strict exception handling the compiler freely moves FP instruction, which may expose traps that otherwise were not observed”.

I believe there is a reason for that other than lazyness of developers :slight_smile:

Vectorization is feasible with constrained fp intrinsics; it just isn’t currently implemented.

The key here is whether something is “externally visible”. According to the C abstract machine, the following two functions are equivalent (assuming the write doesn’t somehow trigger undefined behavior):

void f(int* x) {
  *x = 3;
  _Exit(0);
}
void f(int *x) {
  _Exit(0);
}

Generally, vectorizable loops don’t involve any “externally visible” operations. So vectorization is fine as long as we allow reordering possibly-trapping operations with each other. Which seems like a safe assumption for normal usage.

This is already outlined in LangRef (LLVM Language Reference Manual — LLVM 19.0.0git documentation): “The number and order of floating-point exceptions is NOT guaranteed. For example, a series of FP operations that each may raise exceptions may be vectorized into a single instruction that raises each unique exception a single time.”

The trap hanler may change state of some object of type volatile sig_atomic_t and the fact of trap becomes externaly visible.

The example I meant is:

float A[2] = { 1.0, 0.0 };
float B[2] = { 0.0, 0.0 };
float C[2];
for(int i = 0; i < 2; i++)
  C[i] = A[i] / B[i];
}

Assuming the loop is vectorized, two exceptions are rised. Both can initiate trap. If one trap handler terminates the thread, this code is similar the the example @jyknight provided.