Do we need intrinsics for floating-point classification functions?

Hi all,

Some time ago a new intrinsic llvm.isnan was introduced, which was intended to represent IEEE-754 operation isNaN as well as a family of C library functions isnan*. Then a concern was raised (see https://reviews.llvm.org/D104854) that this functionality should be removed. Discussion in the subsequent RFC (https://lists.llvm.org/pipermail/llvm-dev/2021-August/152257.html) came to consensus that such intrinsic is necessary. Nevertheless the patches related to the new intrinsic were reverted. I have to restart the discussion in hope to convince the community that this intrinsic and other classification functions are necessary.

There are two main reasons why this intrinsic is necessary:

  1. It allows correct implementation of isnan if strict floating point semantics is in effect,
  2. It allows preserving the check in -ffast-math compilation.

To facilitate the discussion let’s concentrate on the first problem.

Previously the frontend intrinsic __builtin_isnan was converted into cmp uno during IR generation in clang codegen. This solution is not suitable if FP exceptions are not ignored, because compare instructions raise exceptions if its argument is signaling NaN. Both IEEE-754 (5.7.2) an C standard (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf, F.3p6) demand that this function does not raise floating point exceptions. There was no target-independent IR construct that could represent isnan.

This drawback was significant enough and some attempts to alleviate it were undertaken. In https://reviews.llvm.org/D95948 isnan was implemented using integer operations in strictfp functions. It however is not suitable for targets where a more efficient way exists, like dedicated instruction. Another solution was implemented in https://reviews.llvm.org/D96568, where a hook clang::TargetCodeGenInfo::testFPKind was introduced, which injects target specific code into IR. Such a solution makes IR more target-dependent and prevents some IR-level optimizations.

To have a solution suitable for all cases, a new intrinsic function llvm.isnan was introduced (https://reviews.llvm.org/D104854). It protects the check from undesirable optimizations and preserves it till selector, where it can be lowered in optimal for a particular target way.

Other classification functions also need their own intrinsics. In strictfp mode even a check for zero (iszero) cannot be made by comparing a value against zero, - if the value is signaling NaN, FP exceptions would be raised. James Y Knight in the previous discussion (https://lists.llvm.org/pipermail/llvm-dev/2021-August/152282.html) listed such “non-computational” functions, which should not signal if provided with an sNAN argument.

It looks like new intrinsic is the only consistent and in target-agnostic way to implement these checks in all environments including the case when FP exceptions are not ignored.

Any feedback is welcome.

Thank you for restarting this proposal. It’s a bit more work, but let’s make sure we have a good design, so we don’t miss any details.

What optimizations/analysis does this intrinsic enable in IR in strict-FP mode?

If it is only constant folding, we don’t need to add an intrinsic. We fold lots of math library calls with no corresponding LLVM intrinsic under llvm::ConstantFoldCall().
We also do transforms based on operands/uses of library calls in SimplifyLibCalls.

So I’m really asking: why do we translate isnan() in clang into something else in the first place? The backend can lower that call into whatever suits the target without involving LLVM IR.

Thank you for restarting this proposal. It’s a bit more work, but let’s make sure we have a good design, so we don’t miss any details.

What optimizations/analysis does this intrinsic enable in IR in strict-FP mode?

If it is only constant folding, we don’t need to add an intrinsic. We fold lots of math library calls with no corresponding LLVM intrinsic under llvm::ConstantFoldCall().
We also do transforms based on operands/uses of library calls in SimplifyLibCalls.

Optimization is not the motivation for this intrinsic. If FP exceptions are ignored, unordered comparison can be used to implement isnan. If not, we need to make the test without affecting FP exceptions. Without this intrinsic there is no way to represent such a test.

So I’m really asking: why do we translate isnan() in clang into something else in the first place? The backend can lower that call into whatever suits the target without involving LLVM IR.

It must be a function, known to the backend without declarations/definitions, that is an intrinsic.

Thank you for restarting this proposal. It’s a bit more work, but let’s make sure we have a good design, so we don’t miss any details.

What optimizations/analysis does this intrinsic enable in IR in strict-FP mode?

If it is only constant folding, we don’t need to add an intrinsic. We fold lots of math library calls with no corresponding LLVM intrinsic under llvm::ConstantFoldCall().
We also do transforms based on operands/uses of library calls in SimplifyLibCalls.

Optimization is not the motivation for this intrinsic. If FP exceptions are ignored, unordered comparison can be used to implement isnan. If not, we need to make the test without affecting FP exceptions. Without this intrinsic there is no way to represent such a test.

Here’s a way to represent such a test without an intrinsic:
declare i1 @isnan(double)

%r = call i1 isnan(double x) #1

attributes #1 = { nounwind }

The IR optimizer shouldn’t do anything with that call because it might read/write arbitrary memory - assuming I got the (lack of) function attributes correct; let me know if this should be spelled differently.

We can guarantee that an arbitrary call will survive as-is through the entire IR optimizer pipeline since optimization is not the motivation.

I think there’s even proof that this already works the way you want because on the clang on macOS, I see:
#include <math.h>
int a_libcall(double x) {
return isnan(x);
}

$ clang -O2 isnan.c -S -emit-llvm -o - -ffast-math
define i32 @a_libcall(double %0) local_unnamed_addr #1 {
%2 = tail call i32 @__isnand(double %0) #3
ret i32 %2
}

So I’m really asking: why do we translate isnan() in clang into something else in the first place? The backend can lower that call into whatever suits the target without involving LLVM IR.

It must be a function, known to the backend without declarations/definitions, that is an intrinsic.

Here’s a sqrt() mathlib call in IR that gets simultaneously lowered to both a target-specific instruction AND a libcall on x86:
https://godbolt.org/z/89xETxK6h

We can do something like that for isnan() - if the target has an instruction that implements the IEEE/C definition of isnan, use it. Otherwise, make a call to the mathlib.

No – there is no “isnan” standard library call. C “isnan” is a macro which expands to whatever each libc desires. There is effectively no standardization as to what library function – if any – they actually end up calling. Therefore, the compiler’s __builtin_isnan (and friends) cannot reasonably rely on generating library calls for these.

Probably the best you could say is that we don’t support __builtin_isnan/etc in strict-fp mode. But that’s not a good solution, either.

Hmm…so the Apple clang “@__isnan” is a hack that we can’t use? It seems to be there specifically for the FMF-based reason that was mentioned earlier as motivation for intrinsic isnan. Ie, it only seems to appear when ‘nnan’ would otherwise zap the corresponding ‘fcmp uno’.

I wouldn’t say a “hack” – it’s just an implementation choice that Apple’s libc made. The libc math.h can certainly implement “#define isnan(x)” in a way other than “__builtin_isnan(x)” if it wishes to.

I just think it’s not reasonable for Clang to try to figure out what private implementation functions may be used by the macro-expansion of the “isnan” macro in every libc, in order to have __builtin_isnan expand to a call to that private function. Either we should expand __builtin_isnan directly, or not provide __builtin_isnan. (And the former seems like a better answer to me.)

As you say, we don’t strictly need a new intrinsic – we can emit the code to do the correct integer-based bit-checking in the frontend – as was done before. Yet, that is not ideal. The rationale to have an intrinsic instead of the frontend generating integer-bit-manipulation seem good, IMO.

So the question that arises: which new intrinsic(s) do we want to add? I list 3 options below. I think Option 1 should be discarded from consideration, and I’m not sure which of 2 or 3 is best. I’m leaning towards 3 – it seems like it may be simpler – although I’m not certain.

Option 1: we could theoretically address the problem with an llvm.experimental.constrained.superquiet_fcmp, which would be the same as fcmp, except it would not raise an fp exception even for an sNAN. This would be a straightforward substitution for fcmp in Clang’s existing codegen, and it should work.

However, I don’t think this would be a good idea, because it’s a poor match for hardware. The “superquiet_fcmp” operation doesn’t map to CPU functionality in any ISA i’m aware of. Generally, the implementation would have to be to filter out sNANs prior to using the hardware fp-compare instruction. And how you you detect an sNAN without raising an fp exceptoin? Fall down to integer ops. But by doing that, you’ve done work almost equivalent to what you needed for the actual classification function that you were trying to implement in the first place. Not real useful.

Option 2: We could add a whole family of classification intrinsics. I think that would be:

llvm.isnan
llvm.issignaling
llvm.isinf
llvm.isfinite
llvm.isnormal
llvm.issubnormal
llvm.iszero

(Note, some of these are missing corresponding __builtin_is* in Clang at the moment – we have no __builtin_issignaling, __builtin_issubnormal, or __builtin_iszero. Probably ought to.)

We don’t necessarily need an intrinsic for fpclassify if we have the above intrinsics, since it can be built with llvm.isnan, llvm.isinf, llvm.iszero, and llvm.isnormal.

Option 3: Add only an fpclassify intrinsic.

That is, something like:
i32 llvm.fpclassify.i32.f32(i32 %if_snan, i32 %if_qnan, i32 %if_infinite, i32 %if_normal, i32 %if_subnormal, i32 %if_zero, float %value)

which classifies the given value, returning the value of the argument corresponding to its categorization. We can say the %if_* args are required to be constant integers, if we like, for simplicity of implementation.

Thus, Clang would codegen __builtin_isnan/etc like:
%isnan = call i1 llvm.fpclassify.i1.f32(i1 1, i1 1, i1 0, i1 0, i1 0, i1 0, float %value)
And for fpclassify, we might generate something like:

%ret = call i32 llvm.fpclassify.i32.f32(i32 0, i32 0, i32 1, i32 4, i32 3, i32 2, float %value)

On most architectures, we’d expand this intrinsic into appropriate integer operations (skipping the parts of classification which are irrelevant for the given constant arguments), since there’s no corresponding hardware instructions available for float classification. Or, for non-strictfp functions, we could continue to expand into an fcmp-based set of tests…although looking at the asm we currently generate, the integer versions may well be faster, other than isnan.

On SystemZ/s390x, this intrinsic would translate almost directly into the “test data class” instruction – so long as the %if_* arguments are all 0/1. That’s kinda nice. (“Test data class” takes a fp value and a bitmask, and returns true if a bit is set in the position corresponding to the classification of the fp value.)

Separately, we have the signbit operation. I think that’s the only other operation that needs to be addressed related to this RFC. Currently, Clang always generates integer bit-ops for __builtin_signbit in the frontend. This is arguably OK as is. Yet, completing the set of IR fp classification intrinsics seems like it’d be a good idea. So, we could also (along with any of the above options) add:

i1 llvm.signbit.f32(float %value)

Thank you for summarizing.

I would prefer Option 2, a separate tool for separate tasks. It seems to me that a specialized function is easier to implement than a universal one. As implementation of llvm.isnan demonstrated, there may be various issues in implementing such functions for different targets. It is easier to provide optimized versions of small functions which are used more frequently than fpclassify. Besides, fpclassify may be used itself and a user may choose arbitrary constants, which complicates code generation. It seems that it is more acceptable to tolerate some inefficiency in fpclassify than in basic classification intrinsics. And yes, we should consider signbit with them.

Another consideration in favor of dedicated intrinsic rather than ordinary function. Compiler may do some optimizations when it knows the semantics of the function. In https://reviews.llvm.org/D104854 this intrinsic was optimized out if its argument was provided by an operation with ‘nnan’ flag. We can also think about an optimization that determines basic blocks guarded by llvm.isnan and assign flag ‘nnan’ in them. It could make code faster in many practical cases, and make use of notorious -ffast-math less attractive.

Hi all,

If nobody argues, in a couple of days I will put back the llvm.isnan implementation excluding the changes for fast-mode.

Thanks,
–Serge

Given just how contentious all this has been so far,
i would strongly suggest for that to go through review.

Roman

The review was done in https://reviews.llvm.org/D104854. The design was discussed there, in https://lists.llvm.org/pipermail/llvm-dev/2021-August/152257.html and in this thread. No alternatives were proposed and no disagreement was expressed.

If somebody has concerns or objections, they can express them here or in https://reviews.llvm.org/D104854 if they are about the implementation.

Thanks,
–Serge

The review was done in https://reviews.llvm.org/D104854.

After which a revert was requested by *multiple* people,
said feedback was plainly ignored, and someone else had to do it.
https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy

The dismissiveness (one-sidedness?) of replies, especially in
"[cfe-dev] Should isnan be optimized out in fast-math mode?"
also don't quite inspire optimism.
So forgive me if i'm erring on the safe side here.

The design was discussed there, in https://lists.llvm.org/pipermail/llvm-dev/2021-August/152257.html and in this thread. No alternatives were proposed and no disagreement was expressed.

If somebody has concerns or objections, they can express them here or in https://reviews.llvm.org/D104854 if they are about the implementation.

I do have concerns.
While i agree that some solution is needed for StrictFP,
I don't see consensus (aka, an explicit agreement by a third party
that the proposed design makes sense to them) on the RFC's,
and i do object to plainly recommitting any patches given all of this.

Thanks,
--Serge

Roman

The review was done in https://reviews.llvm.org/D104854.
After which a revert was requested by multiple people,
said feedback was plainly ignored, and someone else had to do it.
https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy

This patch was reviewed according to the usual procedure. The intrinsic is also ordinary, we usually do not discuss every intrinsic in such a way. I don’t know why it caused a negative reaction from some people.

The dismissiveness (one-sidedness?) of replies, especially in
“[cfe-dev] Should isnan be optimized out in fast-math mode?”
also don’t quite inspire optimism.
So forgive me if i’m erring on the safe side here.

I am sorry if my replies were considered by someone as dismissive. Attempting to convince the community is not a one-sidedness. I tried to reply to each new statement until it became clear that the proposal has no sufficient support.

The design was discussed there, in https://lists.llvm.org/pipermail/llvm-dev/2021-August/152257.html and in this thread. No alternatives were proposed and no disagreement was expressed.

If somebody has concerns or objections, they can express them here or in https://reviews.llvm.org/D104854 if they are about the implementation.
I do have concerns.
While i agree that some solution is needed for StrictFP,
I don’t see consensus (aka, an explicit agreement by a third party
that the proposed design makes sense to them) on the RFC’s,
and i do object to plainly recommitting any patches given all of this.

This RFC is devoted to the discussion of the design of this intrinsic. So far it contains constructive discussion and does not contain any objections. The previous discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-August/152257.html were also supportive.

I have no intention to restore the patchset silently, this is why I resume this discussion. Anyone is welcome to share their concerns, objections and other thoughts here.

Thanks,
–Serge

Yes, you should add intrinsics again.

I’m not sure “excluding the changes for fast-mode” is the right way to go about it, however. I think Clang ought to generate llvm.isnan and friends unconditionally in the frontend. The decision on how we want fast-math to affect these intrinsics should be reflected in optimizations and potentially which fast-math-flags Clang puts on the calls, not in whether it emits the intrinsics or something else.

I believe it would be useful to split up the patches in a different way than you submitted the first time. I’d suggest ordering it as:

  1. Introduce the new intrinsics in LLVM IR (all of them, not just isnan).
  2. Update optimizations, as necessary, to have the desired transformations/analyses.
  3. Switch Clang to generate the intrinsics (unconditionally).

Ok, that makes sense.
Thank you!