Documentation of fmuladd intrinsic

The fmuladd intrinsic is described as saying that a multiply and
addition sequence can be fused into an fma instruction “if the code
generator determines that the fused expression would be legal and
efficient”. (http://llvm.org/docs/LangRef.html#llvm-fma-intrinsic)

I’ve spent a bit of time puzzling over how a code generator is supposed
to know if it’s legal to generate an fma instead of a multiply and add

  • surely that’s something for the frontend to determine, based on the
    FP_CONTRACT setting, and not something for the code generator to work
    out?

However, recently I came across
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20120528/0582
22.html
which explains that “legal” in the above definition doesn’t mean legal
from the point of view of the source language, but simply means whether
or not the target architecture has an fma instruction. The thread also
talks about updating the documentation to clarify this, but that
doesn’t seem to have happened.

Assuming that the thread I’ve linked to is correct, would it be
possible to clarify the IR spec accordingly? I think that the current
use of the word “legal” is misleading.

Hey Andrew,

I believe that the term “legal” is associated with the Legalize phase. Please see:

http://llvm.org/docs/CodeGenerator.html#selectiondag-legalize-phase

The x86 backend has a good example of this in llvm/lib/Target/X86/X86ISelLowering.cpp:

if (Subtarget->hasFMA()) {
// Support FMAs!
setOperationAction(ISD::FMA, MVT::f64, Legal);
setOperationAction(ISD::FMA, MVT::f32, Legal);
}
else {
// We don’t support FMA.
setOperationAction(ISD::FMA, MVT::f64, Expand);
setOperationAction(ISD::FMA, MVT::f32, Expand);
}

Hope that helps,
Cameron

From: "Cameron McInally" <cameron.mcinally@nyu.edu>
To: "Andrew Booker" <andrew.booker@arm.com>
Cc: llvmdev@cs.uiuc.edu
Sent: Friday, January 11, 2013 12:37:07 PM
Subject: Re: [LLVMdev] Documentation of fmuladd intrinsic

The fmuladd intrinsic is described as saying that a multiply and
addition sequence can be fused into an fma instruction "if the code
generator determines that the fused expression would be legal and
efficient". ( http://llvm.org/docs/LangRef.html#llvm-fma-intrinsic )

I've spent a bit of time puzzling over how a code generator is
supposed
to know if it's legal to generate an fma instead of a multiply and
add
- surely that's something for the frontend to determine, based on the
FP_CONTRACT setting, and not something for the code generator to work
out?

However, recently I came across
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20120528/0582
22.html
which explains that "legal" in the above definition doesn't mean
legal
from the point of view of the source language, but simply means
whether
or not the target architecture has an fma instruction. The thread
also
talks about updating the documentation to clarify this, but that
doesn't seem to have happened.

Assuming that the thread I've linked to is correct, would it be
possible to clarify the IR spec accordingly? I think that the current
use of the word "legal" is misleading.

Hey Andrew,

I believe that the term "legal" is associated with the Legalize
phase. Please see:

http://llvm.org/docs/CodeGenerator.html#selectiondag-legalize-phase

The x86 backend has a good example of this in
llvm/lib/Target/X86/X86ISelLowering.cpp:

> if (Subtarget->hasFMA()) {
> // Support FMAs!
> setOperationAction(ISD::FMA, MVT::f64, Legal);
> setOperationAction(ISD::FMA, MVT::f32, Legal);
> }
> else {
> // We don't support FMA.
> setOperationAction(ISD::FMA, MVT::f64, Expand);
> setOperationAction(ISD::FMA, MVT::f32, Expand);
> }

There are a few conditions that contribute to the decision of whether or not to make the fmuladd -> fma translation. The relevant code is in CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:

  case Intrinsic::fmuladd: {
    EVT VT = TLI.getValueType(I.getType());
    if (TM.Options.AllowFPOpFusion != FPOpFusion::Strict &&
        TLI.isOperationLegal(ISD::FMA, VT) &&
        TLI.isFMAFasterThanMulAndAdd(VT)){

       [ use FMA ]
    } else {
       [ use MUL + ADD ]
    }

-Hal

Out of curiosity, what is the use-case for isFMAFasterThanMulAndAdd? If a target declares that FMA is actually slower for a given type, why not just declare it as illegal for that type? Wouldn’t that accomplish the same thing without another target hook? I feel like I’m missing something here.

Hal Finkel <hfinkel@anl.gov> writes:

There are a few conditions that contribute to the decision of whether
or not to make the fmuladd -> fma translation. The relevant code is in
CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:

  case Intrinsic::fmuladd: {
    EVT VT = TLI.getValueType(I.getType());
    if (TM.Options.AllowFPOpFusion != FPOpFusion::Strict &&
        TLI.isOperationLegal(ISD::FMA, VT) &&
        TLI.isFMAFasterThanMulAndAdd(VT)){

       [ use FMA ]
    } else {
       [ use MUL + ADD ]
    }

We've written a few TableGen patterns here locally to match FMA and
added a predicate to say in effect TM.Options.AllowFPOpFusion !=
FPOpFusion::Strict. So that's another way to proceed.

In general, I prefer TableGen patterns over manual lowering.

                               -David

Justin Holewinski <justin.holewinski@gmail.com> writes:

Out of curiosity, what is the use-case for isFMAFasterThanMulAndAdd?
If a target declares that FMA is actually slower for a given type,
why not just declare it as illegal for that type? Wouldn't that
accomplish the same thing without another target hook? I feel like I'm
missing something here.

It's not expressed in the code Hal posted but I suppose a target could
have a slow fma that the user nonetheless wants to use for precision
reasons.

                              -David

From: dag@cray.com
To: "Justin Holewinski" <justin.holewinski@gmail.com>
Cc: "Hal Finkel" <hfinkel@anl.gov>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, January 11, 2013 2:13:50 PM
Subject: Re: [LLVMdev] Documentation of fmuladd intrinsic

Justin Holewinski <justin.holewinski@gmail.com> writes:

> Out of curiosity, what is the use-case for
> isFMAFasterThanMulAndAdd?
> If a target declares that FMA is actually slower for a given type,
> why not just declare it as illegal for that type? Wouldn't that
> accomplish the same thing without another target hook? I feel like
> I'm
> missing something here.

It's not expressed in the code Hal posted but I suppose a target
could
have a slow fma that the user nonetheless wants to use for precision
reasons.

Yes, I believe that's right. This way you can still always get an fma with the intrinsic.

-Hal

> From: dag@cray.com
> To: "Justin Holewinski" <justin.holewinski@gmail.com>
> Cc: "Hal Finkel" <hfinkel@anl.gov>, "LLVM Developers Mailing List" <
llvmdev@cs.uiuc.edu>
> Sent: Friday, January 11, 2013 2:13:50 PM
> Subject: Re: [LLVMdev] Documentation of fmuladd intrinsic
>
> Justin Holewinski <justin.holewinski@gmail.com> writes:
>
> > Out of curiosity, what is the use-case for
> > isFMAFasterThanMulAndAdd?
> > If a target declares that FMA is actually slower for a given type,
> > why not just declare it as illegal for that type? Wouldn't that
> > accomplish the same thing without another target hook? I feel like
> > I'm
> > missing something here.
>
> It's not expressed in the code Hal posted but I suppose a target
> could
> have a slow fma that the user nonetheless wants to use for precision
> reasons.

Yes, I believe that's right. This way you can still always get an fma with
the intrinsic.

Now I'm confused. If a target declares that fmuladd is "slow" for a given
type, it will be lowered to mul + add in SDAG anyway (according to this
code snippet). So how could a user override this?

From: dag@cray.com
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "cameron mcinally" <cameron.mcinally@nyu.edu>, llvmdev@cs.uiuc.edu
Sent: Friday, January 11, 2013 2:12:01 PM
Subject: Re: [LLVMdev] Documentation of fmuladd intrinsic

Hal Finkel <hfinkel@anl.gov> writes:

> There are a few conditions that contribute to the decision of
> whether
> or not to make the fmuladd -> fma translation. The relevant code is
> in
> CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:
>
> case Intrinsic::fmuladd: {
> EVT VT = TLI.getValueType(I.getType());
> if (TM.Options.AllowFPOpFusion != FPOpFusion::Strict &&
> TLI.isOperationLegal(ISD::FMA, VT) &&
> TLI.isFMAFasterThanMulAndAdd(VT)){
>
> [ use FMA ]
> } else {
> [ use MUL + ADD ]
> }

We've written a few TableGen patterns here locally to match FMA and
added a predicate to say in effect TM.Options.AllowFPOpFusion !=
FPOpFusion::Strict. So that's another way to proceed.

In general, I prefer TableGen patterns over manual lowering.

Just to be clear, fmuladd was really only added for one reason: to allow the proper modeling of fp-contraction restrictions in the C99 standard. Because these restrictions are based on source-language statement boundaries, and statement boundaries are known only to the frontend, we needed a way for the frontend to create fmas that could later be undone in a target-specific way.

-Hal

From: "Justin Holewinski" <justin.holewinski@gmail.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "David A. Greene" <dag@cray.com>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, January 11, 2013 2:19:01 PM
Subject: Re: [LLVMdev] Documentation of fmuladd intrinsic

> From: dag@cray.com
> To: "Justin Holewinski" < justin.holewinski@gmail.com >
> Cc: "Hal Finkel" < hfinkel@anl.gov >, "LLVM Developers Mailing
> List" < llvmdev@cs.uiuc.edu >
> Sent: Friday, January 11, 2013 2:13:50 PM
> Subject: Re: [LLVMdev] Documentation of fmuladd intrinsic
>

> Justin Holewinski < justin.holewinski@gmail.com > writes:
>
> > Out of curiosity, what is the use-case for
> > isFMAFasterThanMulAndAdd?
> > If a target declares that FMA is actually slower for a given
> > type,
> > why not just declare it as illegal for that type? Wouldn't that
> > accomplish the same thing without another target hook? I feel
> > like
> > I'm
> > missing something here.
>
> It's not expressed in the code Hal posted but I suppose a target
> could
> have a slow fma that the user nonetheless wants to use for
> precision
> reasons.

Yes, I believe that's right. This way you can still always get an fma
with the intrinsic.

Now I'm confused. If a target declares that fmuladd is "slow" for a
given type, it will be lowered to mul + add in SDAG anyway
(according to this code snippet). So how could a user override this?

The user should use the fma intrinsic directly. There are now two intrinsics, fma and fmuladd.

-Hal

> From: "Justin Holewinski" <justin.holewinski@gmail.com>
> To: "Hal Finkel" <hfinkel@anl.gov>
> Cc: "David A. Greene" <dag@cray.com>, "LLVM Developers Mailing List" <
llvmdev@cs.uiuc.edu>
> Sent: Friday, January 11, 2013 2:19:01 PM
> Subject: Re: [LLVMdev] Documentation of fmuladd intrinsic
>
>
>
>
>
>
>
> > From: dag@cray.com
> > To: "Justin Holewinski" < justin.holewinski@gmail.com >
> > Cc: "Hal Finkel" < hfinkel@anl.gov >, "LLVM Developers Mailing
> > List" < llvmdev@cs.uiuc.edu >
> > Sent: Friday, January 11, 2013 2:13:50 PM
> > Subject: Re: [LLVMdev] Documentation of fmuladd intrinsic
> >
>
>
> > Justin Holewinski < justin.holewinski@gmail.com > writes:
> >
> > > Out of curiosity, what is the use-case for
> > > isFMAFasterThanMulAndAdd?
> > > If a target declares that FMA is actually slower for a given
> > > type,
> > > why not just declare it as illegal for that type? Wouldn't that
> > > accomplish the same thing without another target hook? I feel
> > > like
> > > I'm
> > > missing something here.
> >
> > It's not expressed in the code Hal posted but I suppose a target
> > could
> > have a slow fma that the user nonetheless wants to use for
> > precision
> > reasons.
>
> Yes, I believe that's right. This way you can still always get an fma
> with the intrinsic.
>
>
>
> Now I'm confused. If a target declares that fmuladd is "slow" for a
> given type, it will be lowered to mul + add in SDAG anyway
> (according to this code snippet). So how could a user override this?

The user should use the fma intrinsic directly. There are now two
intrinsics, fma and fmuladd.

Ah, alright. I missed that there are actually two different intrinsics.

FMA is not semantically equivalent to fmul+fadd. If the user called the fma() libm function, we’re obligated to translate that into an actual FMA instruction (or a libcall). If they were fmul+fadd with FP_CONTRACT turned on, we’re allowed to generate an FMA, but we don’t want to if it will be slower than just doing the fmul+fadd.

–Owen