In r187314, APFloat multiplication by with NaNs was made to always
yield a positive NaN. I am wondering whether that was the correct
decision. It is of course true that the result of a multiplication is
undefined in IEEE, however, we were using multiplication by -1.0 to
implement IEEE negate, which is defined to preserve the sign bit.
r210428 made 0-NaN have IEEE negate behavior, which is good because it
seems to me from r187314 to r210428 there was no compliant way to
implement it in LLVM. Does somebody remember what the arguments for
the behavior change in r187314 were? It seems more sane to me to
preserve the sign bit than to unconditionally overwrite it, especially
considering that the hardware doesn't do it this way.
From IEEE 754-2008, §5.5.1:
Implementations shall provide the following homogeneous quiet-computational sign bit operations for all
supported arithmetic formats; they only affect the sign bit. The operations treat floating-point numbers and
NaNs alike, and signal no exception. These operations may propagate non-canonical encodings.
Multiplying by -1.0 has the potential to raise a floating point exception on sNaN inputs, and hence is not a valid implementation of negation per IEEE 754.
Ok, I had forgotten about sNaNs. Doesn't the same caveat apply to
0-sNaN then though or does that not signal? Does that mean we need a
separate way to handle negate in the IR? Funnily enough, historically
I believe we were using the multiplication by -1.0 because it was a
more reliable negation that 0-x (from 3.0 until 3.3 at least). Is
there a good reason why multiplication by NaN should kill the sign
Actually scratch my original remark about the hardware. I'm not
actually sure what the hardware does. All I really want is a way to
properly negate a floating point number.
Subtraction is also not a correct implementation of negation, for exactly the same reason. LLVM is simply wrong in this context.
Generally speaking, correct implementations of fabs, fneg, and copysign are built out of logical operations rather than arithmetic ones.
I don’t know offhand why the behavior of multiplication of multiplication in APFloat was changed.
Ok. That you for clarifying the point for me. I was primed for a
regression because this behavior changed over llvm versions and was
causing my tests to fail ;). I'm now doing bitcasting to int, xoring
with the signbit and bitcasting back.
One more update: Since the code generated by the bitcast wasn't ideal
and we were afraid to loose vectorization, etc., we ended up going
with fsub -0.0, x, which for some reason unlike fsub 0.0, x, seems to
be have appropriately at all optimization levels.
That's because "fsub 0.0, x" is incorrect for x=+0.0. Took me a while
to work out why the "obvious" choice didn't work the first time I
encountered it too.
Oh, wow, that makes total sense. Thanks for pointing this out.
Worth noting that –0.0 – x isn't actually correct either, since it fails to flip the sign of –0 if the rounding mode is round toward negative (for platforms that support IEEE-754 rounding modes), and it raises invalid if x is a signaling NaN. As Owen noted, FP negation really "ought" to be treated as a bitwise operation, rather than mapped into arithmetic.
Is there any intention of making floating absolute and negate primitive IR instructions?
I ask because only a few days ago I was also faced with the task of implementing negate in my compiler, and finding no suitable IR instruction, simply subtracted from zero. But this is wrong.
I could change my code to do the bit casting and fiddling, but I wonder: would that be lowered appropriately on all architectures?
A quick survey from my various manuals:
On some of the architectures, moving a value from a floating-point register to integer and back could impact performance, so it pays to use the processor’s native instructions for negation and absolute value if they exist.
As another data point, every GPU I’m aware of directly supports fabs and fneg operations.
I would support a proposal to move fabs and fneg either to intrinsics or instructions, and remove the current practice of using fsub.
Is there any intention of making floating absolute and negate primitive IR
I ask because only a few days ago I was also faced with the task of
implementing negate in my compiler, and finding no suitable IR instruction,
simply subtracted from zero. But this is wrong.
I could change my code to do the bit casting and fiddling, but I wonder:
would that be lowered appropriately on all architectures?
A quick survey from my various manuals:
- m68k has negate and absolute value instructions.
- so does x87
- so does PA-RISC
- but SPARC does not.
AFAIK, SPARC has absolute and negation instructions (fabss, fnegs) and
SPARCv9 has fabsd and fnegd instructions for double precision floating
[V]AND[SS|SD|PS|PD] / [V]ANDN[SS|SD|PS|PD] / [V]OR[SS|SD|PS|PD] is by far the preferred idiom to perform these operations on SSE/AVX.
I would support a proposal to move fabs and fneg either to intrinsics or
instructions, and remove the current practice of using fsub.
Me too (I'd probably favour fneg as an instruction, but perhaps not
fabs; though that may be my C++ background showing).
Just because we *can* use bitwise ops, doesn't mean it's a good idea
(we could eliminate all bitwise ops in terms of nand if we were
feeling evil enough).
FYI, I was looking at the SSE/AVX codegen here:
If LLVM starts caring about FP exceptions, even this won’t be possible. Is there a way of doing an IEEE-754 fneg in C/C++? Ie, there’s no fneg() in libm, so any C method we choose could cause an exception, and that’s not allowed by the IEEE definition of fneg.
Huh? XOR[PS|PD] is an IEEE-754 negate( ) on x86; it does not raise any FP exceptions.
In C, it’s not specified at present whether –x corresponds to negate( ), but the general belief (among IEEE-754 committee members) seems to be that it should, and historically clang has mostly tried to behave that way.
On a platform where –x does not correspond to the IEEE-754 negate( ) operation, but the FP representation is known, you would type-pun to integer (via union or memcpy), xor the signbit, and type-pun back to FP to get the result. The only platform I know of where even this isn’t possible is i386 under an ABI that returns on the x87 stack, where loading a single- or double- sNaN necessarily signals invalid. Thankfully, the last such machine is now … what, 15 years old?
Yes, xorpd is an IEEE-754 negate, but as you noted, this is not (if LLVM starts respecting FP exceptions and rounding modes):
%fsub = fsub float -0.0, %fabsf
So as suggested, LLVM IR requires an fneg instruction or intrinsic to preserve the fneg operation in the C source down to the asm.
I also support moving fneg, at least, to a full IR instruction. fabs I feel less strongly about, but would be fine with as well.