[FPEnv] FNEG instruction

Hey llvm-dev,

Continuing a discussion from D50913…

A group working on the FP rounding mode and trap-safety project have run into a situation where it would make sense to add a new FNEG instruction and decouple the existing FNEG<->FSUB transformations.

The IEEE-754 Standard (Section 5.5.1) makes it clear that neg(x) and sub(-0.0,x) are two different operations. The former is a bitwise operation, while the latter is an arithmetic operation.

That said, LLVM currently conflates the two operations in several places. It would be nice to separate the two operations to give the user explicit control over which type of operation is generated.

D50913 covers a lot of ground, so I’ll leave the discussion here for now until any interested parties are up to speed.

Thanks for looking,
Cameron

Ping…

The current thinking is that FNEG(X) and FSUB(-0.0, X) are not the same operation when X is a NaN or 0. So, the xforms in question should only be valid under Fast-Math conditions.

Is correcting this behavior something that the general LLVM population would like? If not, we can create constrained intrinsics for the FPEnv project.

Thanks again,
Cameron

+1

The current thinking is that FNEG(X) and FSUB(-0.0, X) are not the same operation when X is a NaN or 0.

Do you mean denormals (when flushed) rather than 0 there? AFAIK it's
OK for 0 itself.

So, the xforms in question should only be valid under Fast-Math conditions.

We could probably also "fix" the issue by taking the view that LLVM's
fsub provides extra guarantees for NaN over the IEEE-754 one. I.e.
that "fsub -0.0, x" behaves as negate. I believe that would still be
conformant to IEEE-754 and not require any actual changes in LLVM
(since it's what CPUs would do anyway).

I think that's uglier than adding FNEG though.

Is correcting this behavior something that the general LLVM population would like? If not, we can create constrained intrinsics for the FPEnv project.

I'm in favour of FNEG too. I remember the fsub trick being confusing
when I first encountered it.

Cheers.

Tim.

The current thinking is that FNEG(X) and FSUB(-0.0, X) are not the same operation when X is a NaN or 0.

Do you mean denormals (when flushed) rather than 0 there? AFAIK it's
OK for 0 itself.

Under the assumption of default rounding, yes.

So, the xforms in question should only be valid under Fast-Math conditions.

We could probably also "fix" the issue by taking the view that LLVM's
fsub provides extra guarantees for NaN over the IEEE-754 one. I.e.
that "fsub -0.0, x" behaves as negate. I believe that would still be
conformant to IEEE-754 and not require any actual changes in LLVM
(since it's what CPUs would do anyway).

I think that's uglier than adding FNEG though.

NaN doesn’t need special handling except in the presence of signaling NaNs, where the IEEE-754 negate operation would produce a signaling NaN rather than a quiet NaN + invalid flag. Again, under the default fenv assumption, nothing special is needed.

Is correcting this behavior something that the general LLVM population would like? If not, we can create constrained intrinsics for the FPEnv project.

I'm in favour of FNEG too. I remember the fsub trick being confusing
when I first encountered it.

We should still do fneg, though, because this.

– Steve

Bah, you're correct. 0s are fine. That's assuming the xform is always to
FSUB(-0.0,X) and not FSUB(0.0,X).

NaNs are not ok. I.e.:

FNEG( NaN) = -NaN
FNEG(-NaN) = NaN

FSUB(-0.0, NaN) = NaN
FSUB(-0.0, -NaN) = NaN

Thanks for the correction, Tim.

Some specific architecture may define this, or APFloat might, but IEEE 754 does not interpret the sign of NaN except in four operations (copy, abs, negate, copysign), so it doesn’t say anything about these.

– Steve

Good point. I suppose one could argue that the behavior is undefined.

Apologies, I was wrong about this one. Just tested the FSUB hardware
instruction on all the targets I care about and they all respect the sign
on NaNs. I'm not sure how this got into my head...

No, you weren’t wrong.

The sign of the NaN result is only defined, per the standard, for those 4 operations, no others. Certain hardware or certain ISAs may make more guarantees, of course.

Yeah, of course, but I'm a pragmatist. :wink:

Kidding aside, I've heard a handful of yeas for adding an explicit FNEG
instruction. Are there any nays (or maybes) out there? I'd like to hear
those concerns...

I don't think it matters for the question at hand, but I tested
AArch64 too and it exhibits the behaviour you were describing. That
is, we'd have problems if an fsub -0.0 was actually CodeGened like
that (it's not, of course).

Tim.

Great data point. So it's not just a theoretical problem. Thanks, Tim!

Ping…

I’d like to hear from the major stakeholders on whether we can proceed with this change or not. Either way, it would be nice to pass this road block. If anyone watching is in close proximity to those that can approve such a change, I would appreciate it if you would tap them on the shoulder for me.

Thanks again,
Cameron

Which exactly was the plan?

Add a new, regular instruction?

Add a new constrained math intrinsic?

Both?

Andrew Kaylor made a good point here:

  • As I said, all LLVM IR FP instructions are //assumed// to have no side effects. I’m not sure we want an instruction that goes beyond this to be //defined// as having no side effects. It adds a complication to the language and introduces restrictions on the code generator that aren’t needed in the ordinary case of non-constrained FP. The target code generators are free to do anything they want with the other FP instructions, including things that introduce new FP status flags being set that otherwise wouldn’t be, and for the normal case the back ends should be free to do that with fneg as well.

Personally, I’m not sure I like the idea of having exceptions to the rule that FP instructions also have constrained versions. So I lean towards having both a regular FNEG and a constrained version.

But I think I remember pushback. I can’t put my fingers on the message, though.

Which exactly was the plan?

Add a new, regular instruction?

Add a new constrained math intrinsic?

Both?

I'd like to add an explicit FNEG IR operator. We would not need a
constrained FNEG operator if we go this route.

Andrew Kaylor made a good point here:

   - As I said, all LLVM IR FP instructions are //assumed// to have no
   side effects. I'm not sure we want an instruction that goes beyond this to
   be //defined// as having no side effects. It adds a complication to the
   language and introduces restrictions on the code generator that aren't
   needed in the ordinary case of non-constrained FP. The target code
   generators are free to do anything they want with the other FP
   instructions, including things that introduce new FP status flags being set
   that otherwise wouldn't be, and for the normal case the back ends should be
   free to do that with fneg as well.

Personally, I’m not sure I like the idea of having exceptions to the rule
that FP instructions also have constrained versions. So I lean towards
having both a regular FNEG and a constrained version.

But I think I remember pushback. I can’t put my fingers on the message,
though.

We touched on this in the Differential Review and on this thread. To
summarize:

FNEG(X) is not really the same operation as FSUB(-0.0, X), although the
differences are admittedly subtle. I even went as far to say that any
xforms between the two operations should only occur under FastMath
conditions. If we follow those rules, I think emergence guarantees that we
don't have to worry about the side effects of FNEG (please correct me if
I've missed something).

Extending on that, I suspect that we should not be canonicalizing FNEG(X)
as FSUB(-0.0, X), but rather as XOR(X, 1 << (SIZE_OF_TYPE_IN_BITS - 1)). A
utility function could provide us with the sign-bit constant, so it's not
that ugly.

That said, I agree that Andrew's take is compelling. And I would be okay
with adding a constrained FNEG to solve the immediate issue, if that is the
final decision.

+1 for an explicit FNEG instruction, since as previously discussed, it has stricter requirements for what value may be returned by the operation. And strengthening the requirement on FSUB is not feasible when the values are variables rather than literals.

That is:

FSUB(-0.0, NaN) = either NaN or -NaN
FSUB(-0.0, -NaN) = either NaN or -NaN

FNEG(NaN) = -NaN
FNEG(-NaN) = NaN

I have 1 concern about adding an explicit fneg op to IR:

Currently, fneg qualifies as a binop in IR (since there’s no other way to represent it), and we have IR transforms that are based on matching that pattern (m_BinOp). With a proper unary fneg instruction, those transforms are not going to fire anymore. So I think we’ll need to duplicate code to account for that difference (or hack the matcher to include fneg?).

The existing regression tests probably aren’t going to show optimization failures for those cases*, so I’m not sure how to detect regressions without doing an audit of FP transforms in instcombine and other passes.

  • Are we proposing to canonicalize “fsub -0.0, X → fneg X”? As shown in the example below, I think that’s allowed. If we do that, then we might detect some of the affected transforms, but I suspect that we don’t have that test coverage for most transforms.

I have 1 concern about adding an explicit fneg op to IR:

Currently, fneg qualifies as a binop in IR (since there’s no other way to represent it), and we have IR transforms that are based on matching that pattern (m_BinOp). With a proper unary fneg instruction, those transforms are not going to fire anymore. So I think we’ll need to duplicate code to account for that difference (or hack the matcher to include fneg?).

The existing regression tests probably aren’t going to show optimization failures for those cases*, so I’m not sure how to detect regressions without doing an audit of FP transforms in instcombine and other passes.

Implementation details. ¯_(ツ)_/¯

Seriously though, this is interesting and something that I had not considered. Thinking aloud, what if we caught the FNEG pattern in the CreateFSub*(…) IRBuilder functions and generated an FNeg instruction instead? That way we could get rid of the m_FNeg(…) calls altogether. They would just be replaced with something like: (I.getOpcode() == Instruction::FNeg). Any transform that currently uses m_FNeg(…) would fail to build without an update.

But maybe I’m not considering some substitution that could modify an FSub instruction in place??

  • Are we proposing to canonicalize “fsub -0.0, X → fneg X”? As shown in the example below, I think that’s allowed. If we do that, then we might detect some of the affected transforms, but I suspect that we don’t have that test coverage for most transforms.

It sounds reasonable to continue c14n on “fsub -0.0, X → fneg X”. The problematic case is “fneg X → fsub -0.0, X”.

Although, that does raise the question of who defines undefined behavior: LLVM or the hardware? I would lean towards falling back to the hardware behavior. Take integer divide by zero for example. I’d much rather the hardware define i/0 (e.g. x86 will trap) than LLVM.

I have 1 concern about adding an explicit fneg op to IR:

Currently, fneg qualifies as a binop in IR (since there’s no other way to represent it), and we have IR transforms that are based on matching that pattern (m_BinOp). With a proper unary fneg instruction, those transforms are not going to fire anymore. So I think we’ll need to duplicate code to account for that difference (or hack the matcher to include fneg?).

The existing regression tests probably aren’t going to show optimization failures for those cases*, so I’m not sure how to detect regressions without doing an audit of FP transforms in instcombine and other passes.

Implementation details. ¯_(ツ)_/¯

Seriously though, this is interesting and something that I had not considered. Thinking aloud, what if we caught the FNEG pattern in the CreateFSub*(…) IRBuilder functions and generated an FNeg instruction instead? That way we could get rid of the m_FNeg(…) calls altogether. They would just be replaced with something like: (I.getOpcode() == Instruction::FNeg). Any transform that currently uses m_FNeg(…) would fail to build without an update.

We don’t want to get rid of the m_FNeg calls. Those are safe because they’re providing an abstraction to the callers. Ie, anywhere we use m_FNeg() or Builder.CreateFNeg() should Just Work regardless of how we implement the underlying fneg operation. (We have to fix those function themselves of course to deal with the new opcode.) It’s the code that doesn’t use those abstractions that has the potential to regress.

Here’s an example:
https://reviews.llvm.org/rL343041

The transform in question is in InstCombiner::foldShuffledBinop(). It moves a binop ahead of a vector shuffle. It’s called for every vector binop instruction, so if IR has a unary fneg and we want that to behave as it does today, we’d have to duplicate that fold for fneg (or fake it by substituting an fsub).

But after rummaging around in instcombine at least, I’m actually not too worried by the proposed change. Transforms on generic binops are not common.

But glancing at Reassociate.cpp is scarier. It does a lot of stuff like this:
if (BinaryOperator::isNeg(TheOp) || BinaryOperator::isFNeg(TheOp))
X = BinaryOperator::getNegArgument(TheOp);

I think that’s going to fail without a (terrible) hack to treat the proposed fneg as a binop. So that’s probably a preliminary requirement: find all uses of BinaryOperator::isFNeg() and update them to use m_FNeg(). That should be done ahead of the IR change to limit damage. I don’t know if that’s enough as-is, but we know that code will break without an update.