[PATCH][RFC]: Add fmin/fmax intrinsics

Hi,

I’d like to re-propose adding intrinsics for fmin / fmax. These can be used to implement the equivalent libm functions as defined in C99 and OpenCL, which R600 and AArch64 at least have instructions with the same semantics. This is not equivalent to a simple fcmp + select due to its handling of NaNs.

This has been proposed before, but never delivered (http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-December/057128.html)

To summarize:
1. If either operand is a NaN, returns the other operand
2. If both operands are NaN, returns NaN
3. If the operands are equal, returns a value that will compare equal to both arguments
4. In the normal case, returns the smaller / larger operand
5. Ignore what to do for signaling NaNs, since that’s what the rest of LLVM does currently anyway

- Handling of fmin/fmax (+/- 0.0, +/- 0.0)
  Point 3 is worded as such because this doesn’t seem particularly well specified by any standard I’ve looked at. The most explicit mention of this I’ve found is a footnote in C99 that “Ideally, fmax would be sensitive to the sign of zero, for example fmax(-0.0, 0.0) would return +0; however, implementation in software might be impractical.” It doesn’t really state what the expected behavior is. glibc and OS X’s libc disagree on the (+0, -0) and (-0, +0) cases. To resolve this, the semantics of the intrinsic will be that either will be OK as long as the result compares equal.

For the purposes of constant folding, I’ve tried to follow the literal wording which was most explicit for the expected result from OpenCL (fmin) and taking the comparison +/-0.0 < +/-0.0 will fail.

This means the constant folded results will be:
    fmin(0.0, 0.0) = 0.0
    fmin(0.0, -0.0) = 0.0
    fmin(-0.0, 0.0) = -0.0
    fmin(-0.0, -0.0) = -0.0

Other options would be to always use +0.0, or to be sensitive to the sign and claim -0.0 is less than 0.0.

0001-Add-fmin-fmax-intrinsics.patch (82.3 KB)

0002-Add-basic-fmin-fmax-instcombines.patch (8.19 KB)

0003-Fold-fmin-fmax-with-infinities.patch (4 KB)

0004-Move-fmin-fmax-constant-folding-logic-into-APFloat.patch (3.97 KB)

I have no position on whether or not these should be added, but if they are they should match the IEEE 754 semantics, which fully specify all of these details.

(Signaling NaNs could still be left unspecified as they're optional in IEEE-754).

- Steve

… actually, now that I’m able double-check this, I’m quite surprised to find that we didn’t define fmax(+0,–0) in IEEE–754, which says [paraphrased]:

minNum(x,y) is x if x < y, y if y < x, and the number if one is a number and the other is NaN. Otherwise, it is either x or y (this means results might differ among implementations).

So I think your proposed semantics are perfectly reasonable.

– Steve

FWIW, I am in favor of this proposal.

—Owen

Wouldn’t it be better to use the target’s implementation (if there is one) instead of generically using one option for constant folding? Otherwise target behavior and constant folded behavior would differ, which should be avoided if possible IMO.

This is a problem with all floating point folding, not just with these operations. What Matt is proposing is consistent with how we fold other libm intrinsics.

—Owen

would it be in scope to have intrinsics analogues for fmin/fmax that return Nan if either arg is a nan?
Julia Lang and GHC Haskell are both likely to change their definitions of min/max on floats/doubles to return nan if either arg is Nan.
See here for the julia lang discussion, and I’m amidst putting together the analogous propose for GHC Haskell.

My understanding is the NAN evading semantics of fmin/fmax in the IEEE spec are motivated by using NaN to encode “this data is missing” rather than the more common “this is the result of an erroneous computation”. Granted, such an alternative nan returning fmin/fmax can be written a derived llvm operation too, but they could just as easily benefit from llvm integration.

I hope this suggestion/question is in scope for this thread, if not I appologize for jumping in.

thanks!
-Carter

Hi Carter,

I would strongly advise you against this direction. I’m aware of two directions that existing languages go in defining min/max operations:

  • IEEE 754, C, Fortran, Matlab, OpenCL, and HLSL all define it not to propagate NaNs
  • C++ (std::min/std::max) and OpenGL define it in the trinary operator manner: (a < b) ? a : b

What you’re proposing does not match any existing languages that I’m aware of, and seems likely to hamper cross-language portability for you in the future.

More generally, I don’t see a compelling reason for LLVM to add intrinsic support for the version you’re proposing. Your choice can easily be expanded into IR, and does not have the wide hardware support (particularly in GPUs) that the IEEE version does.

—Owen

good point, no compiler backend intrinsic support is need.

on the IEEE front, the motivation for the nan properties in the standard for fmin and fmax are for the "missing data " interpretation right? This choice does make sense for languages which don’t have a more direct way of expressing missing data (such as option types!). the if based compare version does indeed match what many cpus seem to provide.

on a higher level language front, if nans only represent erroneous computations rather than missing data, what semantic arguments are there for providing the IEEE min or the “if <” min as the language’s min on floats, aside from “other languages do it that way”?

Post-vaction ping

People seem to generally be in favor of adding these. Any comments on the specific patches adding them?

-Matt

I have no specific comments other than to re-iterate my support for getting this in.

—Owen

Hi Carter,

I would strongly advise you against this direction. I’m aware of two
directions that existing languages go in defining min/max operations:

- IEEE 754, C, Fortran, Matlab, OpenCL, and HLSL all define it not to
propagate NaNs
- C++ (std::min/std::max) and OpenGL define it in the trinary operator
manner: (a < b) ? a : b

What you’re proposing does not match any existing languages that I’m aware
of, and seems likely to hamper cross-language portability for you in the
future.

At a quick glance, I found JavaScript [0] and Java [1] both have a min and
max that propagate NaN.

[0] http://people.mozilla.org/~jorendorff/es6-draft.html#sec-math.max
[1]
http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#max(double,%20double)

More generally, I don’t see a compelling reason for LLVM to add intrinsic
support for the version you’re proposing. Your choice can easily be
expanded into IR, and does not have the wide hardware support (particularly
in GPUs) that the IEEE version does.

The IEEE version can also be expanded in LLVM IR. And for GPUs, many GPU
input languages leave the behavior on NaN unspecified, so it's not
obviously the best guide.

Consider also this: The IEEE version exists within a spec where it's
assumed that programmers have elaborate access to information about
floating-point exceptions. In practice, programming languages and
environments have not been able to reliably deliver this level of access.
NaN is one of the few ways left to determine whether an exception has
occurred (and even NaN isn't always enough), and so the motivation for NaN
propagation in practice may be greater than what it was in the IEEE spec.

Dan

That’s not generally true. HLSL (DirectX), CUDA, OpenCL, and Metal all have defined semantics for NaNs which include not propagating them through min/max. GLSL (OpenGL) is the odd one out in this area.

—Owen

Also, as a practical issues, many GPUs have ISA-level support for the IEEE-conforming version. Some (all?) of the AMD GPUs that Matt cares about support it, and PTX has native operations for it as well. The IR expansion of an IEEE-conforming fmin/fmax is at least three compares + selects, which makes it very difficult to pattern match for these targets.

The inverse form (always propagating NaNs) is not widely natively supported. I think AArch64 might have it? MAXPS in SSE performs a ternary operator form that doesn’t match either definition.

—Owen

More generally, I don’t see a compelling reason for LLVM to add intrinsic
support for the version you’re proposing. Your choice can easily be
expanded into IR, and does not have the wide hardware support (particularly
in GPUs) that the IEEE version does.

The IEEE version can also be expanded in LLVM IR. And for GPUs, many GPU
input languages leave the behavior on NaN unspecified, so it's not
obviously the best guide.

That’s not generally true. HLSL (DirectX), CUDA, OpenCL, and Metal all
have defined semantics for NaNs which include not propagating them through
min/max. GLSL (OpenGL) is the odd one out in this area.

HLSL leaves it undefined:

http://msdn.microsoft.com/en-us/library/windows/desktop/bb509624(v=vs.85).aspx

I guess Metal and others only have a "fast-math" flag which (among other
things) makes behavior on NaN undefined, but it's my impression that it's a
popular flag.

Also, as a practical issues, many GPUs have ISA-level support for the
IEEE-conforming version. Some (all?) of the AMD GPUs that Matt cares about
support it, and PTX has native operations for it as well. The IR expansion
of an IEEE-conforming fmin/fmax is at least three compares + selects, which
makes it very difficult to pattern match for these targets.

It's 2 compares + selects:

float nan_swallowing_fmin(float a, float b) {
  return b != b ? a : (a < b ? a : b);
}

which is within the realm of pattern-matching.

The inverse form (always propagating NaNs) is not widely natively
supported.

I think AArch64 *might* have it?

It does. In fact, even armv7 has a NaN-propagating min/max:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489i/CIHDEEBE.html

Not exactly. The HLSL language leaves it undefined, but HLSL bytecode specifies that it’s not NaN-propagating:

http://msdn.microsoft.com/en-us/library/windows/desktop/hh447185(v=vs.85).aspx

And I happen to know from experience that a lot of graphics shaders depend on it working that way in practice.

—Owen

Given IEEE-754’s sway, and its saying what it does on this point, but given also the popularity of NaN-propagating min and max, how about a compromise? We add intrinsics following the IEEE-754 semantics, but we also follow IEEE-754 (and ARMv8) in renaming them to minnum and maxnum, to clarify which interpretation these intrinsics are using.

I’d be fine with that proposal. I could even be convinced if we wanted to add a pair of NaN-propagating intrinsics as well, for targets and languages that want those semantics, even if I disagree with them. I do think that, if we are using the minnum/maxnum names, we should explicitly note that they are equivalent to C’s fmin/fmax, but not std::min/std::max or Java(script)’s min/max.

—Owen

I can rename these, but the convention followed by all the other LLVM intrinsics follow the C library names

minnum and maxnum matches their names in the IEEE 754 standard. It diverges LLVM’s convention, but the names are not without precedent.

—Owen