Min and max

Hi all,

I’m trying to implement a floating-point ‘min’ and ‘max’ operation using select. For ‘min’ I get the expected x86 assembly minss instruction, but for ‘max’ I get a branch instead of maxss.

The corresponding C syntax code looks like this:

float z = (x > y) ? x : y;

Any clues?

Could someone maybe explain to me the basics of LLVM’s target specific optimizations and code generation? I’d love to analyze things like this myself but I don’t know where to start.

Thanks,

Nicolas Capens

Hi all,

I’m trying to implement a floating-point ‘min’ and ‘max’ operation using select. For ‘min’ I get the expected x86 assembly minss instruction, but for ‘max’ I get a branch instead of maxss.

The corresponding C syntax code looks like this:

float z = (x > y) ? x : y;

Any clues?

Your code is not safe for NaNs. This is the correct way to write maxss in C:

float max(float x, float y) {
return !(x < y) ? x : y;
}

If you don’t care about NaNs, you can pass -ffast-math to llvm-gcc, or set “UnsafeFPMath=true” from <llvm/Target/TargetOptions.h>

Could someone maybe explain to me the basics of LLVM’s target specific optimizations and code generation? I’d love to analyze things like this myself but I don’t know where to start.

This one specifically boils down to the semantics of maxss and LLVM IR instructions. For example, this code:

float not_max(float x, float y) {
return (x > y) ? x : y;
}

float really_max(float x, float y) {
return !(x < y) ? x : y;
}

compiles into this LLVM IR (llvm-gcc t.c -S -o - -O -emit-llvm):

define float @not_max(float %x, float %y) nounwind {
entry:
%tmp3 = fcmp ogt float %x, %y ; [#uses=1]
%iftmp.0.0 = select i1 %tmp3, float %x, float %y ; [#uses=1]
ret float %iftmp.0.0
}

define float @really_max(float %x, float %y) nounwind {
entry:
%tmp3 = fcmp uge float %x, %y ; [#uses=1]
%iftmp.1.0 = select i1 %tmp3, float %x, float %y ; [#uses=1]
ret float %iftmp.1.0
}

If you’re interested in target-specific x86 optimizations to be done, take a look at lib/Target/X86/README*.txt

-Chris

Hi all,

Marc pointed out to me that this might be related to my question about floating-point equality, however I still think there’s a valid optimization opportunity here.

The Intel documents specify that the maxss instruction returns the second operand if either operand is a NaN. I believe this is exactly the intended behavior of (x > y) ? x : y where > is an ordered compare. If either x or y is NaN the > returns false so y is returned, the second argument.

I wrote a little test app where I compared the results of (x > y) ? x : y with some inline assembly using maxss for all combinations of 0.0, 1.0 and NaN as inputs for x and y, and they were identical.

Cheers,

Nicolas