what does -ffp-contract=fast allow?

Sent from my Verizon Wireless 4G LTE DROID
On Nov 17, 2016 5:53 PM, Mehdi Amini <mehdi.amini@apple.com> wrote:
>
>
>> On Nov 17, 2016, at 4:33 PM, Hal Finkel <hfinkel@anl.gov> wrote:
>>
>>
>> ________________________________
>>>
>>> From: “Warren Ristow” <warren.ristow@sony.com>
>>> To: “Sanjay Patel” <spatel@rotateright.com>, “cfe-dev” <cfe-dev@lists.llvm.org>, “llvm-dev” <llvm-dev@lists.llvm.org>
>>> Cc: “Nicolai Hähnle” <nhaehnle@gmail.com>, “Hal Finkel” <hfinkel@anl.gov>, “Mehdi Amini” <mehdi.amini@apple.com>, “andrew kaylor” <andrew.kaylor@intel.com>
>>> Sent: Thursday, November 17, 2016 5:58:58 PM
>>> Subject: RE: what does -ffp-contract=fast allow?
>>>
>>> > Is this a bug? We transformed the original expression into:
>>> > x * y + x
>>>
>>> I’d say yes, it’s a bug.
>>>
>>>
>>>
>>> Unless ‑ffast‑math is used (or some appropriate subset that gives us leeway, like ‑fno‑honor‑infinities or ‑fno‑honor‑nans, or somesuch), the re-association isn’t allowed, and that blocks the madd contraction.
>>
>> I agree. FP contraction alone only allows us to do x*y+z → fma(x,y,z).
>
>
> I agree too, but the more difficult question is "which flags are needed here?”
> Would FPContract + no-inf be enough? If not why and how to document it?

I think that the relevant question is: Is the contracted form more precise for all inputs (or the same precision as the original)? If so, then this should be allowed with just fp-contract+no-inf. Otherwise, more is required.

-Hal

>
>
> —
> Mehdi
>
>
>
>>>
>>>
>>> From: Sanjay Patel [mailto:spatel@rotateright.com]
>>> Sent: Thursday, November 17, 2016 3:22 PM
>>> To: cfe-dev <cfe-dev@lists.llvm.org>; llvm-dev <llvm-dev@lists.llvm.org>
>>> Cc: Nicolai Hähnle <nhaehnle@gmail.com>; Hal Finkel <hfinkel@anl.gov>; Mehdi Amini <mehdi.amini@apple.com>; Ristow, Warren <warren.ristow@sony.com>; andrew.kaylor@intel.com
>>> Subject: what does -ffp-contract=fast allow?
>>>
>>>
>>>
>>> This is just paraphrasing from D26602, so credit to Nicolai for first raising the issue there.
>>>
>>> float foo(float x, float y) {
>>> return x * (y + 1);
>>> }
>>>
>>> $ ./clang -O2 xy1.c -S -o - -target aarch64 -ffp-contract=fast | grep fm
>>> fmadd s0, s1, s0, s0
>>>
>>> Is this a bug? We transformed the original expression into:
>>> x * y + x
>>>
>>> When x=INF and y=0, the code returns INF if we don’t reassociate. With reassociation to FMA, it returns NAN because 0 * INF = NAN.
>>>
>>> 1. I used aarch64 as the example target, but this is not target-dependent (as long as the target has FMA).
>>>
>>> 2. This is not -ffast-math…or is it? The C standard only shows on/off settings for the associated FP_CONTRACT pragma.
>>>
>>> 3. AFAIK, clang has no documentation for -ffp-contract:
>>> http://clang.llvm.org/docs/UsersManual.html
>>>
>>> 4. GCC says:
>>> https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/Optimize-Options.html#Optimize-Options
>>> “-ffp-contract=fast enables floating-point expression contraction such as forming of fused multiply-add operations if the target has native support for them.”
>>>
>>> 5. The LLVM backend (where this reassociation currently happens) shows:
>>> FPOpFusion::Fast - Enable fusion of FP ops wherever it’s profitable.
>>
>>
>>
>>
>> –
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>
>

fp-contract is confusing, so let me try to summarize that and the underlying implementation:

  1. -ffp-contract=on means honor the compiler’s default FP_CONTRACT setting or any FP_CONTRACT pragmas in the source. Currently, clang defaults to “OFF”. The shouting is not an accident; this is not the same as the flag’s “off” setting. This is described nicely here:
    https://reviews.llvm.org/D24481

If we set “on” in the invocation and we set “ON” in the source, clang will generate @llvm.fmuladd intrinsics for expressions like x*y+z. If you split that into 2 lines in C with a temp variable assignment, it’s no longer a single expression, so no FMA for you. The @llvm.fmuladd intrinsic is our way of preserving the C source information through the optimizer. If we don’t end up producing an FMA instruction for the target in this case, it’s a bug.

  1. -ffp-contract=fast means override the compiler’s default “OFF” setting and override source pragmas to generate FMA when possible, even across C expressions. The “fast” naming is unfortunate because this does not enable most fast-math. Ie, as everyone in this thread agrees so far, we are not allowed to do the reassociation in the example. It’s not strict math though because of that trailing clause that let’s us generate FMA across expressions.

Here’s where it gets more complicated and possibly buggy. Clang does not generate llvm.fmuladd intrinsics with this setting. In this mode, clang generates individual fmul and fadd instructions and relies on the backend to fuse those back together. More background here:
https://llvm.org/bugs/show_bug.cgi?id=17211

I don’t know if it’s possible, but if we’re in this mode and some IR transform pass managed to move/kill an fmul or fadd that was destined to be part of an FMA, I think that would be a bug. This mode is also completely broken with LTO because we’re using a TargetOption to communicate the FMA mode to the backend; there is no instruction-level or function-level attribute/metadata for FMA-ness:

https://llvm.org/bugs/show_bug.cgi?id=25721

To tie this back to the earlier thread about changes to IR FMF, the possibility of adding FMA bits to FMF (as well as storing all FMF in metadata) was discussed here:
https://llvm.org/bugs/show_bug.cgi?id=13118

  1. The backend needs a thread of its own. We have at least these mechanisms to handle FMA codegen:

a. TargetOptions for LessPreciseFPMADOption, UnsafeFPMath, NoInfsFPMath, NoNaNsFPMath, AllowFPOpFusion (Fast, Standard, Strict)

b. SDNodeFlags for UnsafeAlgebra, NoNaNs, NoInfs, NoSignedZeros (but nothing for FMA since IR FMF has nothing for FMA)

c. SelectionDAGTargetInfo::generateFMAsInMachineCombiner()

d. TargetLoweringBase::isFMAFasterThanFMulAndFAdd()

e. TargetLoweringBase::enableAggressiveFMAFusion()

f. ISD::FMA (no intermediate rounding step) and ISD::FMAD (has intermediate rounding) nodes

From: "Sanjay Patel" <spatel@rotateright.com>
To: "Hal J. Finkel" <hfinkel@anl.gov>
Cc: "Mehdi Amini" <mehdi.amini@apple.com>, "llvm-dev"
<llvm-dev@lists.llvm.org>, "cfe-dev" <cfe-dev@lists.llvm.org>,
"andrew kaylor" <andrew.kaylor@intel.com>, "Nicolai Hähnle"
<nhaehnle@gmail.com>, "Warren Ristow" <warren.ristow@sony.com>
Sent: Friday, November 18, 2016 10:37:08 AM
Subject: Re: what does -ffp-contract=fast allow?

fp-contract is confusing, so let me try to summarize that and the
underlying implementation:

1. -ffp-contract=on means honor the compiler's default FP_CONTRACT
setting or any FP_CONTRACT pragmas in the source. Currently, clang
defaults to "OFF". The shouting is not an accident; this is not the
same as the flag's "off" setting. This is described nicely here:
⚙ D24481 make “#pragma STDC FP_CONTRACT” on by default

If we set "on" in the invocation *and* we set "ON" in the source,
clang will generate @llvm.fmuladd intrinsics for expressions like
x*y+z. If you split that into 2 lines in C with a temp variable
assignment, it's no longer a single expression, so no FMA for you.
The @llvm.fmuladd intrinsic is our way of preserving the C source
information through the optimizer. If we don't end up producing an
FMA instruction for the target in this case, it's a bug.

This is not correct.

First, the behavior of -ffp-contract=on/off should just set the default state of the pragma. Once we finish fixing up the test suite to allow us to actually flip the default, this will actually be the case (the review description referenced above is not clear on the desired end state in this regard). Hopefully, this work will be done soon.

Second, it is specifically *not* a bug if @llvm.fmuladd does not become an FMA on the target. It only represents an allowable place to form an FMA. The LangRef specifically states, "Fusion is not guaranteed, even if the target platform supports it." The @llvm.fma intrinsic should become an FMA if the target supports it.

2. -ffp-contract=fast means override the compiler's default "OFF"
setting and override source pragmas to generate FMA when possible,
even across C expressions. The "fast" naming is unfortunate because
this does *not* enable most fast-math. Ie, as everyone in this
thread agrees so far, we are not allowed to do the reassociation in
the example. It's not strict math though because of that trailing
clause that let's us generate FMA across expressions.

Here's where it gets more complicated and possibly buggy. Clang does
not generate llvm.fmuladd intrinsics with this setting. In this
mode, clang generates individual fmul and fadd instructions and
relies on the backend to fuse those back together.

This is definitely not a bug. The problem with the C rules for contraction, which only allow fusion within a C-language statement, don't allow fusion opportunities that appear only after function inlining (or, obviously, across statements in any other sense). This is a real problem, especially in C++ code, where there are a lot of small inline functions in abstraction layers that users expect the compiler to see through before deciding on fusion. Even within a function, the fusions allowed by the C rules are not necessarily performance-optimal.

More background here:
https://llvm.org/bugs/show_bug.cgi?id=17211

I don't know if it's possible, but if we're in this mode and some IR
transform pass managed to move/kill an fmul or fadd that was
destined to be part of an FMA, I think that would be a bug.

No, this also would not be a bug (although could be bad for performance on some architectures).

This mode is also completely broken with LTO because we're using a
TargetOption to communicate the FMA mode to the backend; there is no
instruction-level or function-level attribute/metadata for FMA-ness:
https://llvm.org/bugs/show_bug.cgi?id=25721

Interesting; we should at least have a function-attribute for this that Clang uses.

Thanks again,
Hal

------------------------------

*From: *"Sanjay Patel" <spatel@rotateright.com>
*To: *"Hal J. Finkel" <hfinkel@anl.gov>
*Cc: *"Mehdi Amini" <mehdi.amini@apple.com>, "llvm-dev" <
llvm-dev@lists.llvm.org>, "cfe-dev" <cfe-dev@lists.llvm.org>, "andrew
kaylor" <andrew.kaylor@intel.com>, "Nicolai Hähnle" <nhaehnle@gmail.com>,
"Warren Ristow" <warren.ristow@sony.com>
*Sent: *Friday, November 18, 2016 10:37:08 AM
*Subject: *Re: what does -ffp-contract=fast allow?

fp-contract is confusing, so let me try to summarize that and the
underlying implementation:

1. -ffp-contract=on means honor the compiler's default FP_CONTRACT setting
or any FP_CONTRACT pragmas in the source. Currently, clang defaults to
"OFF". The shouting is not an accident; this is not the same as the flag's
"off" setting. This is described nicely here:
⚙ D24481 make “#pragma STDC FP_CONTRACT” on by default

If we set "on" in the invocation *and* we set "ON" in the source, clang
will generate @llvm.fmuladd intrinsics for expressions like x*y+z. If you
split that into 2 lines in C with a temp variable assignment, it's no
longer a single expression, so no FMA for you. The @llvm.fmuladd intrinsic
is our way of preserving the C source information through the optimizer. If
we don't end up producing an FMA instruction for the target in this case,
it's a bug.

This is not correct.

First, the behavior of -ffp-contract=on/off should just set the default
state of the pragma. Once we finish fixing up the test suite to allow us to
actually flip the default, this will actually be the case (the review
description referenced above is not clear on the desired end state in this
regard). Hopefully, this work will be done soon.

Second, it is specifically *not* a bug if @llvm.fmuladd does not become an
FMA on the target. It only represents an allowable place to form an FMA.
The LangRef specifically states, "Fusion is not guaranteed, even if the
target platform supports it." The @llvm.fma intrinsic should become an FMA
if the target supports it.

Ah, I mixed up llvm.fma and llvm.fmuladd. The FP_CONTRACT ON setting allows
- but does not require - FMA codegen within a C statement. So the use of
llvm.fmuladd is our way of preserving the C statement boundary and is the
"blessed" op that the backend recognizes when operating in
FPOpFusionMode::Standard.

From: "Sanjay Patel" <spatel@rotateright.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Mehdi Amini" <mehdi.amini@apple.com>, "llvm-dev"
<llvm-dev@lists.llvm.org>, "cfe-dev" <cfe-dev@lists.llvm.org>,
"andrew kaylor" <andrew.kaylor@intel.com>, "Nicolai Hähnle"
<nhaehnle@gmail.com>, "Warren Ristow" <warren.ristow@sony.com>
Sent: Friday, November 18, 2016 1:35:44 PM
Subject: Re: what does -ffp-contract=fast allow?

> > From: "Sanjay Patel" < spatel@rotateright.com >
>

> > To: "Hal J. Finkel" < hfinkel@anl.gov >
>

> > Cc: "Mehdi Amini" < mehdi.amini@apple.com >, "llvm-dev" <
> > llvm-dev@lists.llvm.org >, "cfe-dev" < cfe-dev@lists.llvm.org >,
> > "andrew kaylor" < andrew.kaylor@intel.com >, "Nicolai Hähnle" <
> > nhaehnle@gmail.com >, "Warren Ristow" < warren.ristow@sony.com >
>

> > Sent: Friday, November 18, 2016 10:37:08 AM
>

> > Subject: Re: what does -ffp-contract=fast allow?
>

> > fp-contract is confusing, so let me try to summarize that and the
> > underlying implementation:
>

> > 1. -ffp-contract=on means honor the compiler's default
> > FP_CONTRACT
> > setting or any FP_CONTRACT pragmas in the source. Currently,
> > clang
> > defaults to "OFF". The shouting is not an accident; this is not
> > the
> > same as the flag's "off" setting. This is described nicely here:
>

> > ⚙ D24481 make “#pragma STDC FP_CONTRACT” on by default
>

> > If we set "on" in the invocation *and* we set "ON" in the source,
> > clang will generate @llvm.fmuladd intrinsics for expressions like
> > x*y+z. If you split that into 2 lines in C with a temp variable
> > assignment, it's no longer a single expression, so no FMA for
> > you.
> > The @llvm.fmuladd intrinsic is our way of preserving the C source
> > information through the optimizer. If we don't end up producing
> > an
> > FMA instruction for the target in this case, it's a bug.
>

> This is not correct.

> First, the behavior of -ffp-contract=on/off should just set the
> default state of the pragma. Once we finish fixing up the test
> suite
> to allow us to actually flip the default, this will actually be the
> case (the review description referenced above is not clear on the
> desired end state in this regard). Hopefully, this work will be
> done
> soon.

> Second, it is specifically *not* a bug if @llvm.fmuladd does not
> become an FMA on the target. It only represents an allowable place
> to form an FMA. The LangRef specifically states, "Fusion is not
> guaranteed, even if the target platform supports it." The @llvm.fma
> intrinsic should become an FMA if the target supports it.

Ah, I mixed up llvm.fma and llvm.fmuladd. The FP_CONTRACT ON setting
allows - but does not require - FMA codegen within a C statement. So
the use of llvm.fmuladd is our way of preserving the C statement
boundary and is the "blessed" op that the backend recognizes when
operating in FPOpFusionMode::Standard.

That's correct.

Thanks again,
Hal