AArch64 fmul/fadd fusion

Hi All,

Recently I was doing some AArch64 work and noticed some cases where
fmuls were not getting fused with fadds. Is there any particular
reason that the AArch64 machine combiner doesn't do this like it does
for add/mul?

I am happy to work up a patch for this, but I wanted to make sure that
there wasn't a good reason for it not already being there. FWIW, I
see where GCC is doing this optimization.

Cheers,

Meador

Recently I was doing some AArch64 work and noticed some cases where
fmuls were not getting fused with fadds. Is there any particular
reason that the AArch64 machine combiner doesn't do this like it does
for add/mul?

AArch64's fmadd instruction is fused, which means it can produce a
different result to the two operations executed separately. The C and
C++ standards do not allow such changes.

We support them via various flags (-ffast-math is the obvious one,
though an argument could be made for supporting -mfused-madd and
-mnofused-madd as well). But in the backend we definitely have to
check *somthing* before doing the substitution.

I am happy to work up a patch for this, but I wanted to make sure that
there wasn't a good reason for it not already being there. FWIW, I
see where GCC is doing this optimization.

You might want to get together with Ana Pazos, who just asked similar
questions today.

Personally, I'd be sad to see Clang's default execution become less
conforming to the standard. But it's pretty undeniable that
"-std=gnu99/gnu11/..." do allow it by default.

Cheers.

Tim.

AArch64's fmadd instruction is fused, which means it can produce a
different result to the two operations executed separately. The C and
C++ standards do not allow such changes.

Sorry, sloppy language on my part. I was aware of fmadd, but I was
really asking about turning sequences like:

  fmul s0, s0, s2
  fadd s0, s1, s0

into a fmadd:

  fmadd s0, s0, s2, s1

We support them via various flags (-ffast-math is the obvious one,
though an argument could be made for supporting -mfused-madd and
-mnofused-madd as well). But in the backend we definitely have to
check *somthing* before doing the substitution.

Support in what way? I don't see any patterns or machine combiners
to do the above replacement. Did I miss something?

If I didn't miss something, is there interest in adding this to the AArch64
machine combiners assuming it was guarded by the right flag?

You might want to get together with Ana Pazos, who just asked similar
questions today.

I see that on cfe-dev now. Thanks for pointing that out.

Personally, I'd be sad to see Clang's default execution become less
conforming to the standard. But it's pretty undeniable that
"-std=gnu99/gnu11/..." do allow it by default.

Agreed.

Thanks for the reply!

Cheers,

Meador

...which is exactly the transform Tim is talking about. FMA is required
to provide a correctly rounded result *without* intermediate rounding.
The mul+add sequence on other the side are required to perform that
rounding. There are algorithms known to break under such optimisations,
which is why it is not enabled by default. The logic for producing FMA
is not target specific otherwise.

Joerg

We support them via various flags (-ffast-math is the obvious one,
though an argument could be made for supporting -mfused-madd and
-mnofused-madd as well). But in the backend we definitely have to
check *somthing* before doing the substitution.

Support in what way? I don't see any patterns or machine combiners
to do the above replacement. Did I miss something?

(Having just checked lib/CodeGen/SelectionDAG) I believe it's mostly
covered before target-specific code gets involved, in the generic DAG
logic at the moment. ISD::FMA nodes should only be formed from paired
mul/add based on "AllowFPOpFusion" checks (and other similar explicit
permissions).

So I think the status quo is that generic code does any permissible
fusion, and AArch64 code really doesn't have any freedom to fuse more
operations on top of that.

If I didn't miss something, is there interest in adding this to the AArch64
machine combiners assuming it was guarded by the right flag?

If you can find a missing-but-allowed case, adding it to the generic
handling would probably be better than making it AArch64 only.

Cheers.

Tim.

(Having just checked lib/CodeGen/SelectionDAG) I believe it's mostly
covered before target-specific code gets involved, in the generic DAG
logic at the moment. ISD::FMA nodes should only be formed from paired
mul/add based on "AllowFPOpFusion" checks (and other similar explicit
permissions).

Ah, I see it in the DAG combiner now. Thanks.

So I think the status quo is that generic code does any permissible
fusion, and AArch64 code really doesn't have any freedom to fuse more
operations on top of that.

Yeah. I originally got thrown off because the integer version is AArch64
specific. There doesn't seem to be any generic handling for that.

If you can find a missing-but-allowed case, adding it to the generic
handling would probably be better than making it AArch64 only.

The cases that I was looking into all seem to be handled.

Thanks!

-- Meador