[PATCH][RFC]: Add fmin/fmax intrinsics

arsenm · August 13, 2014, 11:38pm

Hi,

I’d like to re-propose adding intrinsics for fmin / fmax. These can be used to implement the equivalent libm functions as defined in C99 and OpenCL, which R600 and AArch64 at least have instructions with the same semantics. This is not equivalent to a simple fcmp + select due to its handling of NaNs.

This has been proposed before, but never delivered (http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-December/057128.html)

To summarize:
1. If either operand is a NaN, returns the other operand
2. If both operands are NaN, returns NaN
3. If the operands are equal, returns a value that will compare equal to both arguments
4. In the normal case, returns the smaller / larger operand
5. Ignore what to do for signaling NaNs, since that’s what the rest of LLVM does currently anyway

- Handling of fmin/fmax (+/- 0.0, +/- 0.0)
Point 3 is worded as such because this doesn’t seem particularly well specified by any standard I’ve looked at. The most explicit mention of this I’ve found is a footnote in C99 that “Ideally, fmax would be sensitive to the sign of zero, for example fmax(-0.0, 0.0) would return +0; however, implementation in software might be impractical.” It doesn’t really state what the expected behavior is. glibc and OS X’s libc disagree on the (+0, -0) and (-0, +0) cases. To resolve this, the semantics of the intrinsic will be that either will be OK as long as the result compares equal.

For the purposes of constant folding, I’ve tried to follow the literal wording which was most explicit for the expected result from OpenCL (fmin) and taking the comparison +/-0.0 < +/-0.0 will fail.

This means the constant folded results will be:
    fmin(0.0, 0.0) = 0.0
    fmin(0.0, -0.0) = 0.0
    fmin(-0.0, 0.0) = -0.0
    fmin(-0.0, -0.0) = -0.0

Other options would be to always use +0.0, or to be sensitive to the sign and claim -0.0 is less than 0.0.

0001-Add-fmin-fmax-intrinsics.patch (82.3 KB)

0002-Add-basic-fmin-fmax-instcombines.patch (8.19 KB)

0003-Fold-fmin-fmax-with-infinities.patch (4 KB)

0004-Move-fmin-fmax-constant-folding-logic-into-APFloat.patch (3.97 KB)

Stephen_Canon1 · August 14, 2014, 2:55pm

I have no position on whether or not these should be added, but if they are they should match the IEEE 754 semantics, which fully specify all of these details.

(Signaling NaNs could still be left unspecified as they're optional in IEEE-754).

- Steve

Stephen_Canon1 · August 14, 2014, 4:03pm

… actually, now that I’m able double-check this, I’m quite surprised to find that we didn’t define fmax(+0,–0) in IEEE–754, which says [paraphrased]:

minNum(x,y) is x if x < y, y if y < x, and the number if one is a number and the other is NaN. Otherwise, it is either x or y (this means results might differ among implementations).

So I think your proposed semantics are perfectly reasonable.

– Steve

resistor · August 16, 2014, 6:52am

FWIW, I am in favor of this proposal.

—Owen

Mueller-Roemer_Johan · August 18, 2014, 8:22am

Wouldn’t it be better to use the target’s implementation (if there is one) instead of generically using one option for constant folding? Otherwise target behavior and constant folded behavior would differ, which should be avoided if possible IMO.

resistor · August 18, 2014, 5:00pm

This is a problem with all floating point folding, not just with these operations. What Matt is proposing is consistent with how we fold other libm intrinsics.

—Owen

carter · August 18, 2014, 7:00pm

would it be in scope to have intrinsics analogues for fmin/fmax that return Nan if either arg is a nan?
Julia Lang and GHC Haskell are both likely to change their definitions of min/max on floats/doubles to return nan if either arg is Nan.
See here for the julia lang discussion, and I’m amidst putting together the analogous propose for GHC Haskell.

My understanding is the NAN evading semantics of fmin/fmax in the IEEE spec are motivated by using NaN to encode “this data is missing” rather than the more common “this is the result of an erroneous computation”. Granted, such an alternative nan returning fmin/fmax can be written a derived llvm operation too, but they could just as easily benefit from llvm integration.

I hope this suggestion/question is in scope for this thread, if not I appologize for jumping in.

thanks!
-Carter

resistor · August 18, 2014, 7:32pm

Hi Carter,

I would strongly advise you against this direction. I’m aware of two directions that existing languages go in defining min/max operations:

IEEE 754, C, Fortran, Matlab, OpenCL, and HLSL all define it not to propagate NaNs
C++ (std::min/std::max) and OpenGL define it in the trinary operator manner: (a < b) ? a : b

What you’re proposing does not match any existing languages that I’m aware of, and seems likely to hamper cross-language portability for you in the future.

More generally, I don’t see a compelling reason for LLVM to add intrinsic support for the version you’re proposing. Your choice can easily be expanded into IR, and does not have the wide hardware support (particularly in GPUs) that the IEEE version does.

—Owen

carter · August 19, 2014, 3:32am

good point, no compiler backend intrinsic support is need.

on the IEEE front, the motivation for the nan properties in the standard for fmin and fmax are for the "missing data " interpretation right? This choice does make sense for languages which don’t have a more direct way of expressing missing data (such as option types!). the if based compare version does indeed match what many cpus seem to provide.

on a higher level language front, if nans only represent erroneous computations rather than missing data, what semantic arguments are there for providing the IEEE min or the “if <” min as the language’s min on floats, aside from “other languages do it that way”?

arsenm · September 3, 2014, 4:14pm

Post-vaction ping

People seem to generally be in favor of adding these. Any comments on the specific patches adding them?

-Matt

resistor · September 12, 2014, 4:29am

I have no specific comments other than to re-iterate my support for getting this in.

—Owen

Dan_Gohman · September 12, 2014, 5:27pm

Hi Carter,

I would strongly advise you against this direction. I’m aware of two
directions that existing languages go in defining min/max operations:

- IEEE 754, C, Fortran, Matlab, OpenCL, and HLSL all define it not to
propagate NaNs
- C++ (std::min/std::max) and OpenGL define it in the trinary operator
manner: (a < b) ? a : b

What you’re proposing does not match any existing languages that I’m aware
of, and seems likely to hamper cross-language portability for you in the
future.

At a quick glance, I found JavaScript [0] and Java [1] both have a min and
max that propagate NaN.

[0] http://people.mozilla.org/~jorendorff/es6-draft.html#sec-math.max
[1]
http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#max(double,%20double)

More generally, I don’t see a compelling reason for LLVM to add intrinsic
support for the version you’re proposing. Your choice can easily be
expanded into IR, and does not have the wide hardware support (particularly
in GPUs) that the IEEE version does.

The IEEE version can also be expanded in LLVM IR. And for GPUs, many GPU
input languages leave the behavior on NaN unspecified, so it's not
obviously the best guide.

Consider also this: The IEEE version exists within a spec where it's
assumed that programmers have elaborate access to information about
floating-point exceptions. In practice, programming languages and
environments have not been able to reliably deliver this level of access.
NaN is one of the few ways left to determine whether an exception has
occurred (and even NaN isn't always enough), and so the motivation for NaN
propagation in practice may be greater than what it was in the IEEE spec.

Dan

resistor · September 12, 2014, 9:24pm

That’s not generally true. HLSL (DirectX), CUDA, OpenCL, and Metal all have defined semantics for NaNs which include not propagating them through min/max. GLSL (OpenGL) is the odd one out in this area.

—Owen

resistor · September 12, 2014, 10:04pm

Also, as a practical issues, many GPUs have ISA-level support for the IEEE-conforming version. Some (all?) of the AMD GPUs that Matt cares about support it, and PTX has native operations for it as well. The IR expansion of an IEEE-conforming fmin/fmax is at least three compares + selects, which makes it very difficult to pattern match for these targets.

The inverse form (always propagating NaNs) is not widely natively supported. I think AArch64 might have it? MAXPS in SSE performs a ternary operator form that doesn’t match either definition.

—Owen

Dan_Gohman · September 13, 2014, 12:39am

More generally, I don’t see a compelling reason for LLVM to add intrinsic
support for the version you’re proposing. Your choice can easily be
expanded into IR, and does not have the wide hardware support (particularly
in GPUs) that the IEEE version does.

The IEEE version can also be expanded in LLVM IR. And for GPUs, many GPU
input languages leave the behavior on NaN unspecified, so it's not
obviously the best guide.

That’s not generally true. HLSL (DirectX), CUDA, OpenCL, and Metal all
have defined semantics for NaNs which include not propagating them through
min/max. GLSL (OpenGL) is the odd one out in this area.

HLSL leaves it undefined:

I guess Metal and others only have a "fast-math" flag which (among other
things) makes behavior on NaN undefined, but it's my impression that it's a
popular flag.

Also, as a practical issues, many GPUs have ISA-level support for the
IEEE-conforming version. Some (all?) of the AMD GPUs that Matt cares about
support it, and PTX has native operations for it as well. The IR expansion
of an IEEE-conforming fmin/fmax is at least three compares + selects, which
makes it very difficult to pattern match for these targets.

It's 2 compares + selects:

float nan_swallowing_fmin(float a, float b) {
return b != b ? a : (a < b ? a : b);
}

which is within the realm of pattern-matching.

The inverse form (always propagating NaNs) is not widely natively
supported.

I think AArch64 *might* have it?

It does. In fact, even armv7 has a NaN-propagating min/max:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489i/CIHDEEBE.html

resistor · September 13, 2014, 5:13am

Not exactly. The HLSL language leaves it undefined, but HLSL bytecode specifies that it’s not NaN-propagating:

http://msdn.microsoft.com/en-us/library/windows/desktop/hh447185(v=vs.85).aspx

And I happen to know from experience that a lot of graphics shaders depend on it working that way in practice.

—Owen

Dan_Gohman · September 15, 2014, 7:26pm

Given IEEE-754’s sway, and its saying what it does on this point, but given also the popularity of NaN-propagating min and max, how about a compromise? We add intrinsics following the IEEE-754 semantics, but we also follow IEEE-754 (and ARMv8) in renaming them to minnum and maxnum, to clarify which interpretation these intrinsics are using.

resistor · September 15, 2014, 8:17pm

I’d be fine with that proposal. I could even be convinced if we wanted to add a pair of NaN-propagating intrinsics as well, for targets and languages that want those semantics, even if I disagree with them. I do think that, if we are using the minnum/maxnum names, we should explicitly note that they are equivalent to C’s fmin/fmax, but not std::min/std::max or Java(script)’s min/max.

—Owen

arsenm · September 17, 2014, 9:44pm

I can rename these, but the convention followed by all the other LLVM intrinsics follow the C library names

resistor · September 17, 2014, 9:53pm

minnum and maxnum matches their names in the IEEE 754 standard. It diverges LLVM’s convention, but the names are not without precedent.

—Owen

Topic		Replies	Views
RFC: What is the real behavior for the minnum/maxnum intrinsics? LLVM Dev List Archives	7	237	May 23, 2024
[PATCH 1/2] amdgcn/fmin: Explicitly check for NaNs OpenCL	17	362	March 3, 2018
[RFC]: Fix llvm.min.f and llvm.max.f* intrinsics LLVM Project	8	267	October 11, 2024
[Proposed breaking change/RFC] Remove min and max from -arith-expand-ops MLIR	21	876	January 10, 2023
Math instructions LLVM Dev List Archives	3	108	January 6, 2005

[PATCH][RFC]: Add fmin/fmax intrinsics

Related topics