I opened an issue yesterday (powi(x, n) is constant folded as if it were pow(x, (double)n) · Issue #65088 · llvm/llvm-project · GitHub) about the way that llvm.powi() is constant folded, and today I’ve been working on a patch to fix it. Unfortunately, as I’ve been trying to wrap my head around the correct way to do that, I’ve discovered that I’m pulling on a thread that may unravel a lot of related things. So, I’d like to start a general discussion about constant folding of floating-point operations.
@AaronBallman opened an issue in May (Incorrect constant folding behavior of floating-point operations · Issue #62479 · llvm/llvm-project · GitHub) on this same topic, but the discussion there was centered on interpretation of the C and C++ language standards as they relate to evaluating constant expressions. That’s definitely relevant to what I’m bringing up here, but I’d like to shift the focus to rules for constant folding with regard to the LLVM IR language definition.
I was somewhat dismayed by the claim in issue 62479 that the C and C++ language standards don’t require that all calls to math library functions will return the same result for a given input. I can’t find any such requirement. I also can’t find anything in the LLVM IR language reference that explicitly states such a requirement. However, I am certain that users universally expect this to be the case, and in the case of LLVM IR, I think this behavior is implied by the existence of the ‘afn’ fast-math flag. That this flag grants permission to return an approximate result implies that the absence of the flag must require that the result not be approximated. To me that means that for a given input value, all calls to the function will return the same result. You can’t not approximate something that doesn’t return a consistent value.
There is a problem, however, in that when we see a known math library function called with constant arguments, the LLVM optimizer will constant fold it by calling that function at compile time. The problem is that this means my program may produce different numeric results depending on whether or not the compiler is able to deduce that the argument is constant. So, for example, my results may change based on a decision the compiler makes about function inlining. This seems to be violating very basic rules of the optimizer respecting program semantics.
During the discussion in issue 62479, there was an emphasis on the impact of constant folding for cross-compilation. However, the problem isn’t necessarily limited to cross-compilation. It can happen any time the program being compiled is linked against a different math library than the compiler itself was linked against. For instance, if the compiler is linked against an LLVM math library, and the program being compiled is linked against a GNU math library, there is a potential for this problem to arise. Possibly even different versions of a library from the same vendor could cause it.
I know of at least two other cases where this can be a problem even if the compiler and the program being compiled are both using the same math library. One is the llvm.powi() function as I described in issue 65088. The other has to do with FMA formation. Since I’ve already described the powi() issue in 65088, I’ll describe only the FMA issue here.
Consider the following code:
double f(double x, double y, double z) {
return x * y + z;
}
If I compiled this program using ‘clang -O2 -ffp-contract=on’ clang will produce this IR:
define double @f(double %x, double %y, double %z) {
entry:
%0 = tail call double @llvm.fmuladd.f64(double %x, double %y, double %z)
ret double %0
}
The intrinsic in this case gives the compiler permission to generate fuse the operations “if the code generator determines that (a) the target instruction set has support for a fused operation, and (b) that the fused operation is more efficient than the equivalent, separate pair of mul and add instructions.” However, the LLVM constant folder will evaluate this operation as a fused multiply and add even if I’m compiling for a target that doesn’t support fused operations. So, once again, the result can depend on earlier optimization decisions made by the compiler. What’s worse (to me at least) is that if I compile with -ffp-contract=fast instead of -ffp-contract=on, the constant folder doesn’t use fused operations, even if the target does support them.
In the cases of the llvm.fmuladd and llvm.powi intrinisics, our current definitions leave some room for this variation in behavior, but I’d like to suggest that this is a problem in those definitions.
My basic proposal here is that the LLVM IR language definition should assume that all floating point operations, including math library functions and math intrinsics, should produce consistent and reproducible results that are independent of compiler optimizations unless fast-math flags permit otherwise.
Opinions?