Summary
I’d like to refine the semantics of the "denormal-fp-math"
function attribute to provide stronger guarantees regarding what assumptions the optimizer can and cannot make in the presence of this attribute. The goal of this change would be to allow LLVM IR to describe various semantic modes to more closely model the execution-time behavior of target processors that support flushing denormal/subnormal values to zero.
Background
Floating-point environment
On some target architectures, flushing of denormal inputs or outputs can be enabled or disabled dynamically. For example, on x86-based targets there are bits in the MXCSR register to control whether denormal inputs are treated as zero (DAZ) and whether denormal results are flushed to zero (FTZ). For such architectures, the denormal flushing behavior is a de facto part of the floating-point environment, although there is no explicit mention of such behavior being part of the floating-point environment in standards documents, such as IEEE-754 or the C language standard.
By default, LLVM assumes IEEE-754 semantics for the handling of denormal values, but it is possible to describe some restrictions using the "denormal-fp-math"
attribute.
Attribute semantics
The current LLVM Language References says the “denormal-fp-math” attribute “indicates the denormal (subnormal) handling that may be assumed for the default floating-point environment.” The attribute is associated with a comma-separated pair of string values, each of which may be "ieee"
, "preserve-sign"
, "positive-zero"
, or "dynamic"
. The first entry indicates the flushing mode for the result of floating-point operations. The second indicates the handling of denormal inputs to floating point instructions.
The current definition states that if the output mode is "preserve-sign"
or "positive-zero"
denormal results may be flushed to zero but are not required to be. The result is that transformations like x * 1.0 -> x
are permitted.
The Lang Ref definition states “If the mode is "dynamic"
, the behavior is derived from the dynamic state of the floating-point environment. Transformations which depend on the behavior of denormal values should not be performed.” However, there seems to be some ambiguity about the meaning of this last statement. In a previous discussion, @arsenm told me “the intention was that you cannot replace non-canonicalizing operations with canonicalizing operations without knowing the mode.” And that seems to be the way the attribute is currently being handled. The identity transformation mentioned above (x * 1.0 -> x
) is not blocked by "denormal-fp-math"="dynamic,dynamic"
.
One place where the "denormal-fp-math"
attribute is considered is in the value tracking and fpclass deduction associated with explicit comparisons with zero. If the "denormal-fp-math"
attribute is not present or the input mode is not set to "ieee"
we will assume that an equality comparison with zero guarantees that a value is zero. If the input mode is "dynamic"
or "preserve-sign"
or "positive-zero"
we do not make this assumption.
Motivation
I would like to strengthen the definition of "denormal-fp-math"
for two reasons:
- To provide consistent numeric results when users to change the FTZ/DAZ modes when FENV_ACCESS is allowed.
- To allow users to rely on the compiler preserving numeric behavior in accordance with the denormal behavior described using the
-fdenormal-fp-math
command-line option currently provided by clang or similar options with other front ends.
Proposal
I am proposing strengthening the definition of the "denormal-fp-math"
to say that when this attribute is present the optimizer is not permitted to perform any transformation that would change the numeric results of the generated program if it were executed with the denormal mode set as described by the attribute. If the input or output modes are set to "dynamic"
the compiler is not permitted to perform any transformation that would change the numeric results under any denormal mode available with the target architecture.
This would primarily affect two types of transformation: (1) removal or introduction of canonicalizing operations, and (2) constant folding involving denormal values.
We would continue to use “ieee,ieee” as the default denormal mode and so existing transformations that make this assumption would be permitted by default.
Canonicalizing operations
When the "denormal-fp-math"
attribute is set to a non-IEEE mode, we would not be allowed to eliminate operations such as x = x * 1.0
which potentially flush input values to zero. This pattern is sometimes used in math libraries which are required to behave in a way that is consistent with the dynamic FTZ/DAZ modes. A function implementation may look like this:
float f(float x) {
if (x == 0.0) {
// Handle the non-zero case
} else {
// We may get here as a result of a flushed denormal.
// Return zero with the sign of the input value.
return x * 1.0f;
}
}
If the compiler eliminates the x * 1.0f
operation, this function will return an incorrect result for denormal inputs when the DAZ flag is set.
Constant folding of denormals
When we perform constant folding involving a number with a denormal input values or a denormal result, the constant folding should honor the denormal mode described by the "denormal-fp-math"
attribute. If the attribute is set to "dynamic,dynamic"
, we should not perform any constant folding involing denormal values. If the attribute is set to "ieee,ieee"
(or is absent) we can perform constant folding as we currently do, using the denormal values and denormal results according to the IEEE standard. If the input or output modes are set to "preserve-sign"
or "positive-zero"
, the constant folding should be performed with denormal values flushed in the way described by the attribute.
Note, the LLVM optimizer will currently perform constant folding even when constrained intrinsics are used if APFloat reports that performing the operation would not raise any floating-point exceptions. This can change the numeric results of the program in cases where the DAZ flag is set.
Further discussion
This topic has already been discussed extensively here: Questions about llvm.canonicalize
I have also proposed this as a topic for discussion at the LLVM Floating-Point Working Group meeting this Wednesday at 10 AM Pacific/5 PM UCT. This instance of the meeting has been rescheduled due to a holiday conflict last week, so it isn’t on the LLVM calendar. The meeting link is https://meet.google.com/kxo-bayk-nnd