max/min intrinsics

Redmond_Paul · December 5, 2012, 4:26pm

I have been working on a patch to add support for max/min reductions in LoopVectorize. One of the comments that came up in review is that the implementation could be simplified (and less fragile) if max and min intrinsics were recognized rather than looking for compare-select sequences.

The suggestion was to change compare-selects into max and min intrinsic calls during instcombine.

The intrinsics to add are:
declare iN llvm.{smin,smax}.iN(iN %a, iN %b)
declare iN llvm.{umin,umax}.iN(iN %a, iN %b)
declare fN llvm.{fmin,fmax}.fN(fN %a, fN %b)

What does the community think?

Paul

Sebastian_Pop3 · December 5, 2012, 5:06pm

Redmond, Paul wrote:

I have been working on a patch to add support for max/min reductions in LoopVectorize. One of the comments that came up in review is that the implementation could be simplified (and less fragile) if max and min intrinsics were recognized rather than looking for compare-select sequences.

The suggestion was to change compare-selects into max and min intrinsic calls during instcombine.

+1

Sebastian

Duncan_Sands · December 5, 2012, 5:29pm

Hi Paul,

Shuxin_Yang · December 5, 2012, 6:34pm

Min/max certainly makes Loop Nest Optimization (including the innermost loop vectorization) lots easier.
However, I like they are "lowered" in lower level scalar opt (sopt).

I kinda feel "raw" instructions is bit easier than integrated instruction to optimized in sopt, and
the "raw" instruction could expose more opportunities.

e.g. In the following snippet, if we break the max() into "raw" instruction, the cost of comparison
is reduced thanks to the CSE, and it also reveals that more often than not, Z hold value of min_v + 2.
However, max() obscure this info.

Chris_Lattner · December 5, 2012, 7:48pm

It seems inevitable. For the floating point version, please make it very clear what the behavior of max(-0,+0) and related cases are. This also means stuff that matches compare/select idioms (e.g. llvm/Support/PatternMatch.h) will need to be updated.

-Chris

TNorthover · December 5, 2012, 8:43pm

It seems inevitable. For the floating point version, please make it very
clear what the behavior of max(-0,+0) and related cases are.

Along these lines, AArch64 has an instruction "FMAXNM". It returns the
maximum if neither value is NaN, but returns the number if just one
value is NaN. This is in addition to an "FMAX" which propagates NaNs.

I suspect you'll just want to consider this as an "oh yes, make sure
that the result is NaN if either input is" advisory notice, but I
haven't actually thought through the details of implementation yet.

Tim.

gribozavr · December 5, 2012, 9:04pm

Given the recent fast-math flags commit, what about the inverse? ("I
guarantee that inputs are not NaNs") In this regard min/max look like
other floating point instructions to me. Does it make sense to add
them as instructions so that they can profit from fast-math flags?

Dmitri

TNorthover · December 5, 2012, 9:16pm

Hi,

Along these lines, AArch64 has an instruction "FMAXNM". It returns the
maximum if neither value is NaN, but returns the number if just one
value is NaN. This is in addition to an "FMAX" which propagates NaNs.

I suspect you'll just want to consider this as an "oh yes, make sure
that the result is NaN if either input is" advisory notice, but I
haven't actually thought through the details of implementation yet.

Given the recent fast-math flags commit, what about the inverse? ("I
guarantee that inputs are not NaNs")

It's certainly a point to consider in general, but I don't think it
would help in our case. I've not looked at the instruction timings for
any of our CPUs, but I can't imagine FMAXNM being any different from
FMAX. If that's right, it doesn't matter whether NaNs are present
computationally; it becomes a question of what semantics you actually
want.

Tim.

Kevin_Schoedel · December 17, 2012, 6:50pm

The following is our current proposal for llvm.fmax/fmin.*:

[1] If exactly one argument is a NaN, the intrinsic returns the other argument.
[2] If both arguments are NaN, the intrinsic returns a NaN.
[3] An SNaN may behave as a QNaN.
[4] If the arguments compare equal, the intrinsic returns a value that compares equal to both arguments.
[5] Otherwise, the intrinsic returns the greater/lesser of the two arguments.

Rationale and notes:

Points [1] and [2] match the C/Posix library functions' specs.

Point [3] matches the OpenCL library functions, and may permit some implementations to test for NaNs less expensively.

Point [4] accounts for fmax(-0,+0) in IEEE 754 arithmetic, and any similar cases that might exist in other systems (LLVM needs a VAX backend). IEEE specifies that comparisons ignore the sign of zero, so requiring fmax to order ±0 would be expensive on many systems, and is not necessary to support common library functions.

The intrinsics can replace calls to the C and OpenCL library functions.

The intrinsics can be implemented as calls to the C or OpenCL library functions. They can also be implemented by IEEE 754 maxNum()/minNum() operations (but not vice versa).

The intrinsics are not equivalent to an fcmp/select sequence.

Nadav_Rotem1 · December 17, 2012, 7:05pm

This part worries me. The new min/max intrinsics will only be useful if we could pattern match cmp/select into them.

Kevin_Schoedel · December 17, 2012, 7:14pm

Yes, that's the obvious alternative. I don't think we have any strong opinion either way, and fcmp/select is certainly easier to implement.

Nadav_Rotem1 · December 17, 2012, 7:20pm

Maybe we can have two versions of the intrinsic function, "ordered" and "unordered", just like fcmp has [1]. Would that work ?

[1] - http://llvm.org/docs/LangRef.html#fcmp-instruction

Dan_Gohman · December 17, 2012, 8:01pm

No; real fmin and fmax are symmetric, and fcmp+select patterns are not
symmetric.

Also, on point [3] "An SNaN may behave as a QNaN", LLVM doesn't
correctly implement SNaN anyway, so there's no need to clutter up the
rules with SNaN exceptions. When (if?) someone ever implements SNaN,
they'll probably need to introduce a new operator flag to allow passes
to ignore it (nsnan?), and then fmin/fmax can use that.

Dan

Kevin_Schoedel · December 17, 2012, 8:16pm

Maybe we can have two versions of the intrinsic function, "ordered" and
"unordered", just like fcmp has [1]. Would that work ?

If the intrinsics are to be produced by matching fcmp + select, I think we need four versions of each -- select (fcmp olt) a,b is equivalent to select (fcmp uge) b,a and so on.

Topic		Replies	Views
[PATCH][RFC]: Add fmin/fmax intrinsics LLVM Dev List Archives	23	271	October 9, 2014
should we have IR intrinsics for integer min/max? LLVM Dev List Archives	7	213	November 14, 2016
[RFC] Integer Intrinsics for abs, in unsigned/signed min/max LLVM Dev List Archives	6	137	June 18, 2020
What is the status regarding min/max intrinsics? LLVM Dev List Archives	2	228	May 4, 2021
Min and max LLVM Dev List Archives	2	90	May 28, 2008

max/min intrinsics

Related topics