[FP] Constant folding of floating-point operations

Not that it solves the actual issue at hand, but I think the better starting point is to have LLVM IR use IEEE 754 semantics (insofar as IEEE 754 fully specifies it), with an exception for the Table 9.1 math functions that are rarely implemented to the mandated correct rounding.

I’m not sure what you mean here. Can you elaborate on what IEEE 754 semantics we aren’t currently using?

There are issues like Basic floating-point operations are underspecified · Issue #60942 · llvm/llvm-project · GitHub, but that’s mostly a documentation issue there I think.

The point I was going for instead was that in terms of operational semantics, we should be able to optimize knowing the properties of IEEE 754 semantics [1]. So if we see roundToIntegralTiesToAway(0.5) (to use the IEEE 754 name of the operation), we are allowed to replace that with the value that IEEE 754 prescribes, namely, 1.0. Instead, it’s only a small subset of the operations that the compiler cannot evaluate at compile-time (probably limited to Table 9.1 of IEEE 754, or the list of functions C2x Annex F.3p20 provides that C implementations are not required to correctly round).

I guess log(-1.0) is slightly more of a gray area, but assuming you accept my premise above that changing the payload of a NaN is OK, I think we could fold “log(-1.0) → NaN” in the case of the llvm.log intrinsic since log() should always return some NaN for negative numbers and the intrinsic (unlike the function call) doesn’t have side effects. It’s interesting that the current constant folding implementation doesn’t fold this case specifically because it is checking to see if an exception is raised.

Deciding what to do affects more than constant folding; we have logic in ValueTracking (thanks to @arsenm) that narrows down the possible fpclass of calls to relevant intrinsics based on their possible fpclass inputs, so it in effect has to assume things like log(negative number) = NaN. We already have some constraints as to what the behavior of a libm log et al can do, so I think we can reasonably accept many kinds of these folds in the optimizer, even without always folding it. But it does require some investigation to figure out how reliable various libm implementations to know which inputs we can assume are reliably handled and foldable.

x = sin(x) where |x| < ~2^-18 delivers the right answer {but should raise INEXACT}
1.0 = cos(x) where |x| < ~2^-27 delivers the right answer {but should raise INEXACT}

There are a whole bunch of ranges where the result is the argument or the result is a fixed value; should LLVM decide to pursue.

qNaN = log(-1.0) {but should raise DIV-ZERO}

1.0 = sin( 0.5 = mod(constant,2) ) is permissible {and one of the rationales for this family}

-0.0 = fmin( -0.0, +0.0 ) as defined in section 9.6

cox(-x) = cos(x) as defined in section 9.2.1 {and any other even function}
sin(-x) = -sin(x) as defined in section 9.2.1 {and any other odd function}

Basically, the compiler is allowed to apply any “optimization” specified in section 9.2.1 IEEE 754-2019
and a few other places such as section 9.6.

Since LLVM is explicit about not raising exceptions and default rounding modes, you gain considerable latitude.

I feel like it may indeed be reasonable to adopt the proposed policy of ‘IR optimization passes cannot constant-fold a math function call when the resulting value may validly vary between implementations, unless [appropriate flag] is set.’ as a baseline rule.

However, I think there are a bunch of exceptions to the general rule. I suspect llvm.fmuladd is one of those exceptions, because if you don’t want that, then pass -ffp-contract=off to clang, to avoid creating it in the first place.

Of course, no matter what, we’ll still need to be able to evaluate math functions during frontend constant evaluation. And when we do so, we really ought to be doing so in a way that’s independent of the host platform (likely by calling a correctly-rounded implementation). So, even with this IR change, C/C++ code will be able to observe different answers depending on whether an expression is in constant-evaluatation context or not. But that’s really a completely separate issue than the policy for IR and optimization passes we’re discussing here.

OK, I think I see what you’re saying, and I think this is a very good direction. I guess as a first step we should agree on a general policy with regard to IEEE 754/IEC 60559 conformance and create a plan to achieve that conformance level.

At a very minimum, I think it needs to be possible to create an IEEE 754/IEC 60559 conforming front end using LLVM. That may be possible now, but I suspect we have some gaps.

Section 11 of IEEE 754 defines conditions for reproducible results that includes most of the standard math functions as “reproducible operations”. It looks like we could technically conform to the standard just by setting an attribute saying that our results aren’t reproducible, but achieving reproducible results as defined by the standard is what I’m after here. The reproducible results definition also includes reproducible exception behavior, which LLVM IR explicitly does not support by default, but I think that reproducible values and operations (assuming the default floating point environment) should be our goal.

As @jyknight suggests, we may want optional features like llvm.fmuladd to be exceptions to this.

I guess the next step would be to create a more exhaustive proposal to kick start some kind of project. I’ll try to get something started.

Section 11 has a target of “Users obtain the same floating-point numerical and reproducible status flag results, on all platforms supporting such a language standard”. That’s only achievable with a full correctly-rounded math library (…or, I suppose, a language standard which prescribes the particular implementation details of an incorrectly-rounded algorithm – not likely.).

If we presume that we have a correctly-rounded math library, and that it’s required both at compile-time and runtime on all conformant implementations of this language, a lot of the issues identified here go away. In many ways, that’s an easier target to hit than the desire that the optimizer not change the answer, in the messy real-world of a multitude of math libraries on different platforms, each giving slightly different answers.

Yes, I agree that’s not a realistic base goal. What I’d like to see is that users will get the same numeric results on their target hardware regardless of optimizations performed (unless they’ve opted in to a non-value-safe mode). Again, I want it to come back to “the compiler won’t change your results unless you said it could.”

IEEE 754 10.4 talks about “the literal meaning of the source code.” It says, “A language standard should require that by default, when no optimizations are enabled and no alternate exception handling is enabled, language implementations preserve the literal meaning of the source code. That means that language implementations do not perform value-changing transformations that change the numerical results or the flags raised.” I’m curious why the “when no optimizations are enabled” clause is there. It seems to take the teeth out of this entire section, but apart from that this gets at what I would like to see. (As an aside, this raises issues like whether we should generate FMA at -O0.) This section also doesn’t explicitly address the operations that typically correspond to function calls, which is why I focused on section 11.

I imagine that various front ends and targets may want to have different default behavior, but that’s a separate issue. I’m more concerned about clarifying what the LLVM IR definition permits. Once that is settled the issue can be pursued further in the front end handling.

The committee really did want FP calculations to be optimizable !
The committee really did want FP calculations to deliver the same numeric results across platforms !
The committee really did understand that both could not be mandated simultaneously.
And the only way they had to express access to both was to mandate the later when no optimization are enabled was in effect.

It seems to me that there is a fairly easy way out of part of the delimma posed in this thread::

a) prior to any constant arithmetic on FP do a clearallflags(); subroutine call.
b) after any constant arithmetic on FP do an if( isexact() ) query.

If the isexact() query is true (inexact not set) then the compiler is allowed to perform the constant folding {without regards to any flag settings} and the folding has not altered any of the nuanced semantics of IEEE 754-2019.

That suggests relying on fenv access which isn’t really supported

IEEE 754 10.4 talks about “the literal meaning of the source code.” It says, “A language standard should require that by default, when no optimizations are enabled and no alternate exception handling is enabled, language implementations preserve the literal meaning of the source code. That means that language implementations do not perform value-changing transformations that change the numerical results or the flags raised.” I’m curious why the “when no optimizations are enabled” clause is there. It seems to take the teeth out of this entire section, but apart from that this gets at what I would like to see. (As an aside, this raises issues like whether we should generate FMA at -O0.) This section also doesn’t explicitly address the operations that typically correspond to function calls, which is why I focused on section 11.

I don’t particularly like how section 10.4 is worded, partially because of the “no value-changing transformations when no optimizations are enabled” wording. The best way I can square it is that it is trying to say that NaN payloads and FP exception details (only that they are set, not when and where) are not part of the operational semantics of IEEE 754, along with a second request that languages have something like fast-math pragmas.

If the isexact() query is true (inexact not set) then the compiler is allowed to perform the constant folding {without regards to any flag settings} and the folding has not altered any of the nuanced semantics of IEEE 754-2019.

This assumes that we can guarantee that both the host libm implementation and the compiler used to compile LLVM are game for preserving FP exceptions reliably. I can think of several reasons that might not be the case, not least of which is someone compiling with -ffast-math (which also includes setting FTZ/DAZ flags on some platforms–I’m not sure inexact will be set on denormal inputs to functions in that case). Note too that in IEEE 754, the requirement for functions in 9.2 is “Operations should signal the inexact exception if the result is inexact. Operations should not signal the inexact exception if the result is exact”–i.e., setting inexact correctly is a “SHOULD” not a “MUST” requirement like the other flags.

OTOH, for about half of the functions (e.g., trigonometric functions), the set of operations that can return an exact value is extremely small (basically inf, NaN, and maybe 0 or +/-1), and we could hardcode checks for them. The other functions (particularly pow and friends) have larger sets of exact values, but manually checking for exactness is doable, even without a complete implementation of these functions.

I would recommend adding methods to APFloat that have rounding-mode and flags results for these functions anyways. Even if we don’t have implementations yet that return correctly rounded results and correct flags for these operations, at least we can get everybody who might want to call these functions a way to do so that at least has a hope of being correct.

I would also note that this is part of the IEEE 754 push towards correctly rounded function results, which got stronger with the 2019 update.

The C standard, which lags behind in this regard, is probably better aligned with the commonly available math library implementations, and I don’t believe any of the math functions in the C standard are defined as raising INEXACT.

The problem with this, which I thought you were alluding to previously, is that just because these functions have a theoretically exact result, there is no guarantee that a given implementation actually returns the exact result in those cases. I’m not aware of any cases where it doesn’t happen, but it’s at least possible, right?

When you say “isn’t really supported” do you mean “isn’t supported on all platforms” or “mostly works but still has some issues” or both?

I know there are some targets where fenv access just doesn’t make sense for various reasons, but I think we should be pushing for all targets that can support this mode to do so. Someone warned me when I was first adding the constrained intrinsics that putting “experimental” in the name would likely mean that they’d always have “experimental” in the name, but I really think this support is pretty solid right now for at least a few target architectures, though I haven’t tested with anything other than x86.

I’m not sure that proposal makes sense for llvm.fmuladd. Rather I think llvm.fmuladd should be treated as if it did have some fast math flags, since the only reason it was introduced was to allow a slightly more controlled application of the “contract” flag. (Specifically it represents an fmul and an fadd that can only be contracted with each other, not with any other nearby math operations that happen to have the “contract” flag.)

I mean the llvm implementation is barely supported on x86 and has many issues. Plus I don’t think this solves the actual issue

I really don’t. I think the inability to handle target intrinsics makes it nonviable for production use. Plus I still routinely find places emitting unconstrained ops in strictfp functions.

The problem with this, which I thought you were alluding to previously, is that just because these functions have a theoretically exact result, there is no guarantee that a given implementation actually returns the exact result in those cases. I’m not aware of any cases where it doesn’t happen, but it’s at least possible, right?

I believe there is a middle ground somewhere between “we can make no assumptions about the return value” and “we can assume the return value is always consistent”, both of which are obviously false statements. The question is where the best middle ground lies.

It seems to me extremely unlikely that any libm implementation (outside of pure fast-math implementations) would fail to translate the explicitly mentioned special cases correctly. Someone suggested upthread that the inexact flag could be used to detect issues. For functions like sin or exp, there is basically a single value where the result can be exact (usually, but not always, x = +/-0). Functions like sinpi or log2 are exact in more places (integer multiples of 1/2 in the first case, and powers of 2 in the second case), and I suspect it’s possible that all implementations make such cases exact, but I would need to see experiments to actually demonstrate it. And then you get functions like pow, where so many inputs have exact results that I have strong doubts that all implementations are going to be able to exactly translate all such cases correctly.

In any case, I think we need to do some evaluation of existing implementations to see how many guarantees hold up in practice.

That’s fair. It bothers me that we constant fold this in a way that the target couldn’t possibly get (when the target doesn’t support FMA). You could say that users shouldn’t be enabling contract if they aren’t specifying a target that supports FMA, but since fp-contract=on is the default in clang for most platforms, so it’s not necessarily something the user did intentionally. Perhaps clang (and other front ends) just shouldn’t enable contract for if the target feature isn’t enabled.

I agree that this doesn’t solve the problem. And perhaps I’ve been too optimistic about the state of constrained FP support.