Intrinsic llvm::isnan

Hi all,

Some time ago a new intrinsic llvm.isnan was introduced, which is intended to represent IEEE-754 operation isNaN as well as a family of C library functions isnan*. Recently during post-commit review concern was raised (see https://reviews.llvm.org/D104854) that this functionality must have had RFC to make sure there is consensus on semantics.

Previously the frontend intrinsic __builtin_isnan was converted into cmp uno during IR generation in clang codegen. There are two main reasons why this solution is not satisfactory.

  1. Strict floating-point semantics.

If FP exceptions are not ignored, cmp uno must be replaced with its constrained counterpart, namely llvm.experimental.constrained.fcmp or llvm.experimental.constrained.fcmps. None of them is compatible with the semantics of isnan. Both IEEE-754 (5.7.2) an C standard (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf, F.3p6) demand that this function does not raise floating point exceptions. Both the constrained compare intrinsics raise an exception if either operand is a SNAN (https://llvm.org/docs/LangRef.html#id1131). So there was no target-independent IR construct that could express isnan.

This drawback was significant enough and some attempts to alleviate it were undertaken. In https://reviews.llvm.org/D95948 isnan was implemented using integer operations in strictfp functions. It however is not suitable for targets where a more efficient way exists, like dedicated instruction. Another solution was implemented in https://reviews.llvm.org/D96568, where a hook ‘clang::TargetCodeGenInfo::testFPKind’ was introduced, which injects target specific code into IR. Such a solution makes IR more target-dependent and prevents some IR-level optimizations.

  1. Compilation with -ffast-math

The option ‘-ffast-math’ is often used for performance critical code, as it can produce faster code. In this case the user must ensure that NaNs are not used as operand values. isnan is just proposed for such checks, but it was unusable when isnan was represented by compare instruction, because the latter may be optimized out. One of use cases is data in memory, which is processed by a function compiled with -ffast-math. Some items in the data are NaNs to denote absence of values.

This point requires some remarks about using NaNs when a function is compiled with -ffast-math. GCC manual does not specify how this option works, it only states about -ffinite-math-only (https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Optimize-Options.html#Optimize-Options):

Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs.

isnan does not do any arithmetic, only check, so this statement apparently does not apply to it. There is a GCC bug report https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949, where investigation conforms that std::isnan() and std::fpclassify() should works with NaNs as specified even in -ffast-math mode.

Extending NaN restrictions in -ffast-math mode to functions like isnan does not make code faster, but is a source of broken user expectations. If a user writes isnan they usually expect an actual check. Silently removing the check is a stronger action than assuming that float value contains only real numbers.

Intrinsic llvm.isnan solves these problems. It

  • represents the check throughout the IR pipeline and saves it from undesired optimizations,
  • is lowered in selector, which can choose the most suitable implementation for particular target,
  • helps keeping IR target-independent,
  • facilitates program analysis as the operation is presented explicitly and is not hidden behind general nodes.

Note that llvm.isnan is optimized out if its argument is an operation with nnan flag, this behavior agrees with the definition of this flag in LLVM documentation.

Any feedback is welcome.

Thank you for posting the RFC!

I do not believe we should conflate StrictFP support, and
`-ffast-math` handling, these are two separate/separatable concerns.

As for the latter, right now i'm not convinced that we should
second-guess/override explicit user request.
This is inconsistent, and does not match how at least the GCC deals with it.
I think changing the status-quo (before said patch) should be a separate RFC,
and that change should be undone until after that RFC is accepted.

As for the latter, the main point of confusion is,
why is `@llvm.isnan` still used in non-StrictFP code?
The argument that we need `@llvm.isnan` because we *might* transition
in and out of StrictFP section does not seem to hold for me, because
LLVM Language Reference Manual — LLVM 18.0.0git documentation says:

If any FP operation in a function is constrained then they all must be constrained. This is required for correct LLVM IR.

So presumably when codegen'ing a function, we already know that we
will use StrictFP ops, and that should be the knob to use `@llvm.isnan`,
i think.

Roman

Thank you for posting the RFC!

I do not believe we should conflate StrictFP support, and
-ffast-math handling, these are two separate/separatable concerns.

You are right, they are separate, but they originate from the implementation of the same function and can be solved with the same solution.

As for the latter, right now i’m not convinced that we should
second-guess/override explicit user request.
This is inconsistent, and does not match how at least the GCC deals with it.
I think changing the status-quo (before said patch) should be a separate RFC,
and that change should be undone until after that RFC is accepted.

Actually we have two explicit user requests, a call of ‘isnan’ and an option ‘-ffast-math’. IMHO they do not contradict each other as ‘isnan’ is not an arithmetic operation. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion that limitations imposed by ‘-ffast-math’ should be applied only to ‘math’ functions but not to ‘tests’. As for GCC behavior, they agree that this behavior is a bag: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949. Intel and Microsoft compilers do not replace ‘isnan’ with assumed value.

As for the latter, the main point of confusion is,
why is @llvm.isnan still used in non-StrictFP code?

We have to introduce an intrinsic to represent isnan in strictfp environment. It is natural to use it for the default environment as well. Besides, a target may have a more efficient way to represent isnan than unordered comparison.

The argument that we need @llvm.isnan because we might transition
in and out of StrictFP section does not seem to hold for me, because
https://llvm.org/docs/LangRef.html#constrainedfp says:

If any FP operation in a function is constrained then they all must be constrained. This is required for correct LLVM IR.

There was no such intention. The primary motivation was strict fp exceptions.

I’m confused about the definition of:
https://llvm.org/docs/LangRef.html#llvm-experimental-constrained-fcmp-and-llvm-experimental-constrained-fcmps-intrinsics

These intrinsics require an “exception behavior” argument. That argument can take the value “fpexcept.ignore” which is defined as:
“optimization passes may assume that the exception status flags will not be read and that floating-point exceptions will be masked”

i1 @llvm.experimental.constrained.fcmp.f64(double %x, double %x, metadata !“uno”, metadata !“fpexcept.ignore”)

How is this call in LLVM different than the semantics of “isnan(x)” that is required by IEEE-754 or the C standard?

How is this call in LLVM different than the semantics of “isnan(x)” that is required by IEEE-754 or the C standard?

If either of the arguments of llvm.experimental.constrained.fcmp is signaling NaN, this function should raise an ‘Invalid’ exception. ‘isnan’ never raises exceptions.

You’re saying that the function definition text overrides the argument definition text. Why are we choosing that interpretation rather than the inverse (and documenting it one way or the other)?

When codegen lowers this call, it does not know if this is a regular compare and it may create CMP instruction, as for default environment, or this is ‘isnan’ for which it should generate different code, which does not influence FP state.

On most architectures compare instructions change FP state, in default environment it is just ignored, but actually hardware registers can be modified, For ‘isnan’ instructions must actually leave FP state intact.

If the FP state is not made invisible/irrelevant by “fpexcept.ignore”, then why is that argument on this intrinsic call at all?

Ie, why is that argument not used by the backend to decide to lower to a CMP instruction or special isnan instruction or library call?

An example would be helpful - in what case would these two lower differently given that SNAN always raises a visible exception?

call i1 @llvm.experimental.constrained.fcmp.f64(double %x, double %x, metadata !“uno”, metadata !“fpexcept.ignore”) ; “floating-point exceptions will be masked”

call i1 @llvm.experimental.constrained.fcmp.f64(double %x, double %x, metadata !“uno”, metadata !“fpexcept.strict”) ; “this mode can also be used with code that unmasks FP exceptions”

AFAIU, the fpexcept metadata is just a hint for optimizations, which have to respect the sequence of FP instructions. “ignore” is not a command to ask backend to mask the exception. It just tells optimizations the exceptions are not concerned by user, so that the FP instructions can be scheduled.

I think an non exception intrinsic is needed, because it is a commend to request backend not to emit instructions that may raise exception.

Thanks

Pengfei

Why would a target be allowed to lower the constrained fcmp intrinsic with “ignore” to an operation that might raise an exception?

If that scenario does not exist, then make the definition of fcmp “ignore” stronger by saying that it requires a lowering that does not raise an exception.

If refining the definition of “ignore” in this case is too hard to reconcile with other FP ops, how about adding another exception mode (“none”) to specify that exceptions are guaranteed not to be raised?

Why would a target be allowed to lower the constrained fcmp intrinsic with “ignore” to an operation that might raise an exception?

Because the “fpexcept.ignore” argument means that exceptions are being ignored. That is, it doesn’t matter whether exceptions are raised or not. It’s a promise that exceptions are not unmasked or being explicitly checked in this part of the code.

The “fpexcept.ignore” argument isn’t used for the fully strict mode. It is used for two cases:

  1. To get back to the “default” state (in combination with “fpround.tonearest”) when a non-constrained operation is inlined into a function that has constrained operations.

  2. To enable dynamic rounding (-frounding-math) without requiring strict exception semantics.

Regarding NaN comparison support in the presence of fast-math flags, I think this is an excellent reason to have the intrinsic. I needed this for a (non-C/C++) downstream compiler before the llvm.isnan intrinsic was introduced and found that the only ways to achieve it were either to introduce an intrinsic for the comparison or to remove the ‘nnan’ flag everywhere.

-Andy

The exception behavior “ignore” is supposed to be the same behavior as with our normal FP instructions. We normally assume that we are running with traps off. Thus there’s no requirement that we avoid instructions that may trap.

Which means these are intended to be the same:

%z = fadd float %x, %y

%z = call float llvm.experimental.constrained.fadd.f64(float %x, float %y, metadata !”round.tonearest”, metadata !”fpexcept.ignore”)

If you are compiling to an environment with traps on then you probably need “maytrap” or “strict” instead. Well, unless you just don’t care about continuing after a trap and thus don’t need to know exactly where a trap happened.

That said, whether or not traps are allowed is per-instruction rather than with the constrained metadata. See the “fneg” instruction which is defined to never generate traps. We should be defining isnan() to never trap. It’s surprising as all when this traps:

Z = (isnan(x) ? 0 : x * y);

That’s real code (from memory) that bit me and is part of the reason I got into this strictfp work.

Ok, does this edit to the LangRef make sense for the definition of “ignore”:
“optimization passes may assume that the exception status flags will not be read and that floating-point exceptions will be masked” →
“optimization passes may assume that the exception status flags will not be read and that floating-point exceptions may be masked”

It’s still not clear to me if there’s a benefit from having an intrinsic vs. one more exception mode (“none” or “off”).
That answer might depend on how many more of these intrinsics we’ll need? has these:

isfinite()
isinf()
isnormal()
signbit()

others?

According to IEEE 754, the following operations are “non-computational”, and should not signal if provided with an sNAN argument.

I’ve annotated each with → C function/macro equivalent, taken from the table in the C standard draft.

enum class(source) → fpclassify, signbit, issignaling

boolean isSignMinus(source) → signbit

boolean isNormal(source) → isnormal

boolean isFinite(source) → isfinite

boolean isZero(source) → iszero

boolean isSubnormal(source) → issubnormal

boolean isInfinite(source) → isinf

boolean isNaN(source) → isnan

boolean isSignaling(source) → issignaling

boolean isCanonical(source) → iscanonical

enum radix(source) → FLT_RADIX constant

boolean totalOrder(source, source) → totalorder

boolean totalOrderMag(source, source) → totalordermag

(Decimal numbers:)
boolean sameQuantum(source, source) → samequantum

The following list are “quiet-computational” operations, which also should not signal when provided with an sNAN:

I haven’t been following the technical details, but in terms of the English documentation, it makes no sense to say that someone “may assume that may happen.” Either always happens, in which case optimization passes may safely assume that it happens; or never happens, in which case optimization passes may safely assume that it does not happen; or else sometimes happens and sometimes doesn’t, in which case optimizations passes must not assume anything about .

So you might say: “optimization passes may assume that the exception status flags will not be read. Floating-point exceptions might or might not be masked, depending on [____]” (and then mention the relevant variable, such as “instruction set” or “optimization level” or whatever).

HTH,
Arthur

I think the confusion here has to do with what it means for the optimizer to assume that “floating-point exceptions will be masked”. What I’m about to say may be specific to the x86-64-based processor behavior because that’s the one I know, but I think other architectures will have similar behavior but possibly with different terms?

If a floating point exception occurs in an operation and that exception is masked in the floating point control registers, the status bit for that exception will be set and nothing else will happen. If a floating point exception occurs in an operation and that exception is NOT masked in the floating point control registers, an exception handler will be called.

So, what the current documentation means to say is that the optimizer may assume that the status flags will not be read and that no exception handler will be called if a floating point exception occurs. Basically, it means that floating point exceptions will be ignored in all ways.

It’s still not clear to me if there’s a benefit from having an intrinsic vs. one more exception mode (“none” or “off”).

The key point to recognize here is that the metadata arguments to the constrained intrinsics are intended to describe assumptions that can or cannot be made. They are not intended to bring about the state that they describe. So, for example, if the rounding mode argument is set to “fpround.tonearest” that means the optimizer can assume that the processor is set to use the “tonearest” rounding mode, but it does not enforce this. If we see this flag during ISel, we can select an instruction that takes an explicit rounding mode argument and use the “tonearest” value for that argument, but we are also allowed to select an instruction that uses the rounding mode from the MXCSR register and assume that when this instruction is executed that rounding mode will be “tonearest”.

If you need an intrinsic that is guaranteed not to raise exceptions, that should be a distinct intrinsic from any similar intrinsic that may raise exceptions. See, for example, llvm.experimental.constrained.nearbyint and llvm.experimental.constrained.rint which differ only in exception behavior.

-Andy

It probably doesn’t help that IEEE 754-2019 uses the word “signal” to mean, paraphrased, “set a bit in a status register and continue producing a result with no trap”, and that’s in the default FP environment. So IEEE “signaling” is not Unix “signaling”.

Once one picks up on that distinction the IEEE 754 document reads a lot like Andy’s description of x86-64 behavior.

Thanks for making this clearer. So there’s general consensus that we need an intrinsic to represent isnan()…

  1. How many others like this do we need? (see the list that James provided a couple of mails earlier)
  2. How do we introduce these and limit regressions? D104854 had clang produce isnan() for all modes simultaneously, but that leads to problems like: https://llvm.org/PR51556 . I suggest introducing these in LLVM only or in clang with strictFP modes only as a 1st step, so we’re not causing perf regressions that require reverting large patches.

Thanks for making this clearer. So there’s general consensus that we need an intrinsic to represent isnan()…

  1. How many others like this do we need? (see the list that James provided a couple of mails earlier)

We need all classification functions: isnan, isinf, isfinite, isnormal, issubnormal, iszero, fpclassify. In strict exception mode even iszero requires special treatment, it cannot be implemented as compare with zero, because the argument may be a signaling NaN.

  1. How do we introduce these and limit regressions? D104854 had clang produce isnan() for all modes simultaneously, but that leads to problems like: https://llvm.org/PR51556 . I suggest introducing these in LLVM only or in clang with strictFP modes only as a 1st step, so we’re not causing perf regressions that require reverting large patches.

The patch is already 3 weeks in the tree. PR51556 is the only problem so far and it is a missed optimization, not malfunction. Besides, as it is said in the ticket, “the SLP call vectorization fails as we don’t properly account for calls that have different return/arg types”, so the problem already existed, this intrinsic only elucidated it. I think if in a couple of weeks no new problems will be found, work in this direction can be continued, as if there were something broken, it would be already reported.

Regardless of everything, i would like to see a patch that restores
the -ffast-math handling, and *then* the RFC on what the is-nan check
should do when -ffast-math is present.
It is more than possible that the RFC will succeed,
but i don't think a change like that should happen the way it did.

84949 – -ffast-math bugged with respect to NaNs was mentioned as
motivational, but i don't see any resolution there,
it's not even in "confirmed" state.

The change landed *after* the branch, so i think we are not too
constrained on the timeline for it. If there is no RFC before Aug 30'th,
i will be posting a patch to revert said change myself.

Roman.