RFC: change -fp-contract=off to actually disable FMAs

There is no way to disable FMAs with ‘fast’ ops in LLVM. I would like to propose that LLVM’s -fp-contract=off should disable fusion, regardless of any other flags since the Clang option suggests this to be the case:

$ clang --help | grep fp-contract
-ffp-contract= Form fused FP ops (e.g. FMAs): fast (everywhere) | on (according to FP_CONTRACT pragma, default) | off (never fuse)

Current behaviour in LLVM 8.0 below:

$ cat fma.ll
define double @fmadd(double %a, double %b, double %c) {
%mul = fmul fast double %b, %a
%add = fadd fast double %mul, %c
ret double %add
}

$ llc -mattr=+fma fma.ll -fp-contract=off -o - | grep vfmadd

vfmadd213sd %xmm2, %xmm1, %xmm0 # xmm0 = (xmm1 * xmm0) + xmm2

It still generates an fma due to the logic in DAGCombiner:

bool CanFuse = Options.UnsafeFPMath || isContractable(N);
bool AllowFusionGlobally = (Options.AllowFPOpFusion == FPOpFusion::Fast ||
CanFuse || HasFMAD);

In this case, UnsafeFPMath is false but isContractable() is true since the FADD node is tagged as ‘fast’. A simple fix would just be to check for -fp-contract=off, however, I also found there is disagreement in the LLVM -fp-contract option itself:

in TargetOptions.h, =off maps to FPOpFusion::Strict and says “Never fuse FP-ops”, yet the actual cl::opt for =off/Strict says: “Only fuse FP ops when the result won’t be affected”.

Which is it supposed to be? At a minimum we should clear up the discrepancy, but there are two general approaches I see:

Option 1:

  • rename Strict to Off in llvm and always diable FMAs with this option

  • does not require changes to Clang

Example logic:

bool AllowFusionGlobally = Options.AllowFPOpFusion != FPOpFusion::Off &&
(Options.AllowFPOpFusion == FPOpFusion::Fast ||
CanFuse || HasFMAD);

Option 2:

  • keep =strict, add =off to turn off FMAs

  • add =strict to clang as it does not currently exist

  • tie uses of ::Strict to the presence of FMAD (which might have been the intention?)

Example logic:

bool AllowFusionGlobally = Options.AllowFPOpFusion != FPOpFusion::Off &&
(Options.AllowFPOpFusion == FPOpFusion::Fast ||
CanFuse ||
(HasFMAD && Options.AllowFPOpFusion == FPOpFusion::Strict));

Is there context I am not aware of for ::Strict and ISD::FMAD? I could see value in generating FMAs when its expanded form would be identical – but curious if that was actually the intent or not. If it is, perhaps we could allow “Standard/on” to fuse ops if FMAD is available instead of the “Strict” level? In any case, we should still have a way to explicitly turn off FMAs.

Thanks,

Scott

FMAD Is not FMA, and should not be part of any discussion related to fusion rules. It is defined to be identical to the separate fmul and fadd

-Matt

Numerically identical, sure. But, there are other reasons to disable FMAD fusion – namely for performance comparisons.

* Scott Manley via llvm-dev <llvm-dev@lists.llvm.org> [2019-07-10 15:09:07 -0500]:

Numerically identical, sure. But, there are other reasons to disable FMAD
fusion -- namely for performance comparisons.

please don't use the -ffp-contract option for that.

That's not typically something we'd expose to the end user in any way.
Clang as a compiler should be selecting what it thinks is the fastest
sequence to do some particular job; if it's wrong then that's a bug,
not something to add a command-line flag for.

I suspect that's behind the documentation discrepancy you've noticed.
For many people working on LLVM they're the same thing: as long as
behaviour isn't changed for the end user (in valid programs), anything
goes.

Cheers.

Tim.

That’s not typically something we’d expose to the end user in any way.
Clang as a compiler should be selecting what it thinks is the fastest
sequence to do some particular job; if it’s wrong then that’s a bug,
not something to add a command-line flag for.

I can’t agree with that as a general statement. How is it any different than giving the user the ability to control loop unroll amount or setting the preferred vector width or many other options in both clang and llvm?

At any rate, I was only offering an additional reason. Personally I think it’s strange for an option to say “this will never fuse ops” and then under the covers will fuse ops, regardless of how FMAD is defined. However, my primary concern is for FMAs. They have both numeric and performance implications and I do not think it’s unreasonable that off means off.

I think you have a different definition of fused then. Fused is a description of how the operation is computed/rounded, not an instruction count. The F in FMAD is not fused (I know this naming scheme is not great. Every other FP node besides FMA has an F prefix)

For FMA, I think your example IR is correctly handled. The fast instruction flag should override the global FP option you’re providing. For the issue you are describing, this is more of a question of whether clang should be emitting the fast flag or not.

-Matt

I think you have a different definition of fused then. Fused is a description of how the operation is computed/rounded, not an instruction count.

“Only fuse FP ops when the result won’t be affected” is what the existing comment says. So it can’t be both a fused op and not a fused op if it’s only meant to imply a difference in rounding. I’m just re-using the existing wording, and I agree it could be cleaned up if that’s the intent of the -fp-contract option – which I why I was asking for context.

For FMA, I think your example IR is correctly handled. The fast instruction flag should override the global FP option you’re providing. For the issue you are describing, this is more of a question of whether clang should be emitting the fast flag or not.

I disagree. How does clang know what would ultimately form an FMA? It would have to blanket remove ‘fast’ from all fadds.

By the definitions I’m hearing, it sounds like this option also disable fusion of offset calculations into loads and stores. Ditto immediate materialization directly into instructions instead of as separate immediate move.

That doesn’t sound like what “contraction” was designed for. It sounds like you want a different knob, and it sounds like a knob LLVM rarely offers, if ever. Can you provide concrete cases where the compiler is wrong? Are those cases intractably unsolvable as performance defects?

Echoing what everyone else has said, keying on the word “fused” is a red herring here.

fp-contract refers to behavior governed by the STDC FP_CONTRACT pragma. “Contraction” has a formal definition in the C standard:

A floating expression may be contracted, that is, evaluated as though it were a single operation, thereby omitting rounding errors implied by the source code and the way to disallow contracted expressions.

Note that this definition is *purely* in terms of the rounding of arithmetic operations performed by the abstract machine; there is no notion of instructions generated. Formation of fused multiply-add instructions is one specific form of fusion licensed by this pragma, which happens to be the main one of interest from the standpoint of compiler performance optimization for FMA-based architectures.

There’s some imprecision in the documentation caused by a mismatch between what’s interesting for compiler writers (where rounding changes due to FMA formation are allowed) and the abstract specification. That should be cleaned up. However, fp-contract is not a knob to control whether or not abstract-machine operations generate a single arithmetic instruction—it definitely does not, and should not, enable or disable MAD formation.

– Steve

However, fp-contract is not a knob to control whether or not abstract-machine operations generate a single arithmetic instruction

I think that makes sense, but the end result is the same. Wouldn’t you agree that -fp-contract=off still contracts floating point expressions with the initial example I posted? That is the core of what I’m trying to resolve here.

I still have some confusion of what FMAD is supposed to be. Is FMAD actually MAD? Or is it something else? I am fine with leaving it alone if FMAD is not actually contracting floating point operations.

Yes. FMAD is supposed to give the same result as the separate FMUL and FADD with the intermediate rounding step. It exists because most of the combines for FMA still apply, and AMDGPU has an instruction that does this (as long as denormal flushing is OK). I don’t know of any other users.

-Matt

I think that makes sense, but the end result is the same. Wouldn’t you agree that -fp-contract=off still contracts floating point expressions with the initial example I posted? That is the core of what I’m trying to resolve here.

Yes, absolutely. The fast flag licenses many transforms that are not allowed in the default floating-point model, including contraction. So, yes, it’s being contracted in your initial example, and that’s OK!

I still have some confusion of what FMAD is supposed to be. Is FMAD actually MAD? Or is it something else? I am fine with leaving it alone if FMAD is not actually contracting floating point operations.

/// FMAD - Perform a * b + c, while getting the same result as the
/// separately rounded operations.
FMAD,

It does exactly what it says on the tin. It represents an instruction that produces exactly the same result as FADD(FMUL(a,b), c). No contraction is occurring when such a node is formed.

– Steve

armv7 also has a non-fused floating-point multiply-add (VMLA as opposed to the fused operation VFMA).

– Steve