VFCmp failing when unordered or UnsafeFPMath on x86

Nicolas_Capens1 · June 13, 2008, 7:27am

Hi all,

When trying to generate a VFCmp instruction when UnsafeFPMath is set to true I get an assert “Unexpected CondCode” on my x86 system. This also happens with UnsafeFPMath set to false and using an unordered compare. Could someone look into this?

While I’m at it, is there any reason why only the most significant bit of the return value of VFCmp is defined (according to the documentation)? Both AltiVec and SSE set the components of the result to either all 1’s or all 0’s. Having only the most significant bit doesn’t seem useful to me at all, and (arithmetic) shifting vectors to replicate the bit isn’t supported.

Thanks!

Nicolas Capens

Evan_Cheng1 · June 13, 2008, 11:03pm

Hi all,

When trying to generate a VFCmp instruction when UnsafeFPMath is set to true I get an assert “Unexpected CondCode” on my x86 system. This also happens with UnsafeFPMath set to false and using an unordered compare. Could someone look into this?

Have you filed a bug?

While I’m at it, is there any reason why only the most significant bit of the return value of VFCmp is defined (according to the documentation)? Both AltiVec and SSE set the components of the result to either all 1’s or all 0’s. Having only the most significant bit doesn’t seem useful to me at all, and (arithmetic) shifting vectors to replicate the bit isn’t supported.

Nate can probably explain this better than anyone.

Evan

clattner · June 14, 2008, 12:40am

Hi all,

When trying to generate a VFCmp instruction when UnsafeFPMath is set to true I get an assert “Unexpected CondCode” on my x86 system. This also happens with UnsafeFPMath set to false and using an unordered compare. Could someone look into this?

Please provide a testcase.

While I’m at it, is there any reason why only the most significant bit of the return value of VFCmp is defined (according to the documentation)? Both AltiVec and SSE set the components of the result to either all 1’s or all 0’s. Having only the most significant bit doesn’t seem useful to me at all, and (arithmetic) shifting vectors to replicate the bit isn’t supported.

LLVM is intended to support other vector instruction sets, including SPU, Alpha, etc. The commonality was that the MSB is set, and we would like to add support for vector shifting at some point.

-Chris

Nicolas_Capens1 · June 16, 2008, 9:27am

Hi Evan,

No, I haven’t filed a bug. I’d first like someone to confirm this behavior. Anyway, I’ll post some test code in a minute.

Can I contact Nate directly about the definition of VFCmp?

Thanks,

Nicolas

Nicolas_Capens1 · June 16, 2008, 10:23am

Hi Chris,

I’ve attached a replacement of fibonacci.cpp that reproduces the issue on x86/SSE systems.

Regarding the definition of the VFCmp instruction, I think it would really be a lot more valuable to define it as returning all 1’s or all 0’s per element. Just setting the most significant bit is pretty much worthless (someone correct me if I’m wrong). I checked and I couldn’t actually find any instruction set that only sets the MSB when comparing vectors, except by actually doing just a subtract. Since people need full masks before they can do anything useful with it (requiring a shift or conditional replace) I was thinking why not make that part of VFCmp? Note that this change in VFCmp’s definition won’t break compatibility.

Kind regards,

Nicolas

fibonacci.cpp (1.25 KB)

Nate_Begeman · June 16, 2008, 4:01pm

I’ll take a look at this, thanks!

Nate_Begeman · June 16, 2008, 8:42pm

There are other architectures which don’t do this, so defining it as such would over-constrain the problem. The bits are undefined, and may be set to any value by the target arch. The goal here is that you can essentially treat each element of the vector as a signed integer and select (or other operation) if the value is less than zero, rather than specifically equal to -1. This matches things like SSE’s blend and PPC’s fsel.

Nicolas_Capens1 · June 17, 2008, 4:08pm

Hi Nate!

I don’t see how that would work. Select doesn’t work per element.

Say we’re trying to vectorize the following C++ code:

if(v[0] < 0) v[0] += 1.0f;

if(v[1] < 0) v[1] += 1.0f;

if(v[2] < 0) v[2] += 1.0f;

if(v[3] < 0) v[3] += 1.0f;

With SSE assembly this would be as simple as:

movaps xmm1, xmm0 // v in xmm0

cmpltps xmm1, zero // zero = {0.0f, 0.0f, 0.0f, 0.0f}

andps xmm1, one // one = {1.0f, 1.0f, 1.0f, 1.0f}

addps xmm0, xmm1

With the current definition of VFCmp this seems hard if not impossible to achieve. Vector compare instructions that return all 1’s or all 0’s per element are very common, and they are quite powerful in my opinion (effectively allowing to implement a per-element Select). It seems to me that for the few architectures that don’t have such instructions it would be useful to have LLVM generate a short sequence of instructions that does result in useful masks of all 1’s or all 0’s. Or am I missing something?

Thanks a lot,

Nicolas

Eli_Friedman1 · June 17, 2008, 5:30pm

Well, this is a bit of a hack, but one way to write it would be
something like the following:

define <4 x float> @a2(<4 x float> %in) nounwind {
%cmpres = vfcmp olt <4 x float> %in, zeroinitializer
%cmpresshift = udiv <4 x i32> %cmpres, <i32 2147483648, i32
2147483648, i32 2147483648, i32 2147483648>
%cmpmask = sub <4 x i32> zeroinitializer, %cmpresshift
%andmask = and <4 x i32> %cmpmask, bitcast (<4 x float> <float 1.0,
float 1.0,float 1.0,float 1.0> to <4 x i32>)
%andmask2 = bitcast <4 x i32> %andmask to <4 x float>
%result = add <4 x float> %in, %andmask2
ret <4 x float> %result
}

And ideally, that would optimize down to the code as you wrote it.
It'll take some tweaks to codegen to get that working efficiently,
though: first, the div+neg doesn't get opimized to an arithmetic
shift, and second, the combiner can't tell that the shift is a noop.

-Eli

Nate_Begeman · June 17, 2008, 5:43pm

I think you’re missing a target flag that says “vfcmp sets all bits”. Clearly if you are implementing a bit-select in terms of and/andn/or on SSE2, you want all the bits to be set, but there are also useful things like blend and maskmov which look only at the top bit. I’m pretty sure we’ll get to the code you want without constraining vfcmp to set all bits.

Nate

Topic		Replies	Views
Vector comparisons LLVM Dev List Archives	2	92	January 19, 2007
Vector instructions LLVM Dev List Archives	4	152	June 30, 2008
Float compare-for-equality and select optimization opportunity LLVM Dev List Archives	5	131	May 28, 2008
vector compare LLVM Dev List Archives	4	68	December 26, 2008
Ordered / Unordered FP compare are not handled properly on X86 LLVM Dev List Archives	7	116	August 31, 2013

VFCmp failing when unordered or UnsafeFPMath on x86

Related topics