AArch64 fmul/fadd fusion

Meador_Inge1 · September 19, 2015, 3:14am

Hi All,

Recently I was doing some AArch64 work and noticed some cases where
fmuls were not getting fused with fadds. Is there any particular
reason that the AArch64 machine combiner doesn't do this like it does
for add/mul?

I am happy to work up a patch for this, but I wanted to make sure that
there wasn't a good reason for it not already being there. FWIW, I
see where GCC is doing this optimization.

Cheers,

Meador

TNorthover · September 19, 2015, 3:34am

Recently I was doing some AArch64 work and noticed some cases where
fmuls were not getting fused with fadds. Is there any particular
reason that the AArch64 machine combiner doesn't do this like it does
for add/mul?

AArch64's fmadd instruction is fused, which means it can produce a
different result to the two operations executed separately. The C and
C++ standards do not allow such changes.

We support them via various flags (-ffast-math is the obvious one,
though an argument could be made for supporting -mfused-madd and
-mnofused-madd as well). But in the backend we definitely have to
check *somthing* before doing the substitution.

I am happy to work up a patch for this, but I wanted to make sure that
there wasn't a good reason for it not already being there. FWIW, I
see where GCC is doing this optimization.

You might want to get together with Ana Pazos, who just asked similar
questions today.

Personally, I'd be sad to see Clang's default execution become less
conforming to the standard. But it's pretty undeniable that
"-std=gnu99/gnu11/..." do allow it by default.

Cheers.

Tim.

Meador_Inge1 · September 19, 2015, 4:18am

AArch64's fmadd instruction is fused, which means it can produce a
different result to the two operations executed separately. The C and
C++ standards do not allow such changes.

Sorry, sloppy language on my part. I was aware of fmadd, but I was
really asking about turning sequences like:

fmul s0, s0, s2
fadd s0, s1, s0

into a fmadd:

fmadd s0, s0, s2, s1

We support them via various flags (-ffast-math is the obvious one,
though an argument could be made for supporting -mfused-madd and
-mnofused-madd as well). But in the backend we definitely have to
check *somthing* before doing the substitution.

Support in what way? I don't see any patterns or machine combiners
to do the above replacement. Did I miss something?

If I didn't miss something, is there interest in adding this to the AArch64
machine combiners assuming it was guarded by the right flag?

You might want to get together with Ana Pazos, who just asked similar
questions today.

I see that on cfe-dev now. Thanks for pointing that out.

Personally, I'd be sad to see Clang's default execution become less
conforming to the standard. But it's pretty undeniable that
"-std=gnu99/gnu11/..." do allow it by default.

Agreed.

Thanks for the reply!

Cheers,

Meador

Joerg_Sonnenberger1 · September 20, 2015, 9:36pm

...which is exactly the transform Tim is talking about. FMA is required
to provide a correctly rounded result *without* intermediate rounding.
The mul+add sequence on other the side are required to perform that
rounding. There are algorithms known to break under such optimisations,
which is why it is not enabled by default. The logic for producing FMA
is not target specific otherwise.

Joerg

TNorthover · September 21, 2015, 2:37am

We support them via various flags (-ffast-math is the obvious one,
though an argument could be made for supporting -mfused-madd and
-mnofused-madd as well). But in the backend we definitely have to
check *somthing* before doing the substitution.

Support in what way? I don't see any patterns or machine combiners
to do the above replacement. Did I miss something?

(Having just checked lib/CodeGen/SelectionDAG) I believe it's mostly
covered before target-specific code gets involved, in the generic DAG
logic at the moment. ISD::FMA nodes should only be formed from paired
mul/add based on "AllowFPOpFusion" checks (and other similar explicit
permissions).

So I think the status quo is that generic code does any permissible
fusion, and AArch64 code really doesn't have any freedom to fuse more
operations on top of that.

If I didn't miss something, is there interest in adding this to the AArch64
machine combiners assuming it was guarded by the right flag?

If you can find a missing-but-allowed case, adding it to the generic
handling would probably be better than making it AArch64 only.

Cheers.

Tim.

Meador_Inge1 · September 21, 2015, 12:56pm

(Having just checked lib/CodeGen/SelectionDAG) I believe it's mostly
covered before target-specific code gets involved, in the generic DAG
logic at the moment. ISD::FMA nodes should only be formed from paired
mul/add based on "AllowFPOpFusion" checks (and other similar explicit
permissions).

Ah, I see it in the DAG combiner now. Thanks.

So I think the status quo is that generic code does any permissible
fusion, and AArch64 code really doesn't have any freedom to fuse more
operations on top of that.

Yeah. I originally got thrown off because the integer version is AArch64
specific. There doesn't seem to be any generic handling for that.

If you can find a missing-but-allowed case, adding it to the generic
handling would probably be better than making it AArch64 only.

The cases that I was looking into all seem to be handled.

Thanks!

-- Meador

Topic		Replies	Views
RISC-V multiply-add instructions (FMADD.*, etc), bit-exactness, and correctness of optimizations RISCV	1	1935	August 1, 2022
Fusing contract fadd/fsub with normal fmul LLVM Dev List Archives	3	123	June 12, 2017
AVX2 codegen - question reg. FMA generation LLVM Dev List Archives	5	102	September 3, 2019
[AArch64] Is the cost of MSUB instruction is significantly higher than that of the MADD instruction? IR & Optimizations arm	5	347	June 11, 2022
Upcoming API change: FAdd, FSub, FMul LLVM Dev List Archives	8	107	June 17, 2009

AArch64 fmul/fadd fusion

Related topics