More FMA folding opportunities

Hi,

I think more opportunities might be added for FMA in the DAG combiner, please tell me what you think. Right now, those cases are implemented:

fold (fadd (fmul x, y), z) → (fma x, y, z)
fold (fadd x, (fmul y, z)) → (fma y, z, x)

When the TLI callback “enableAggressiveFMAFusion” returns true, we might also support:

fold (fadd (fma x, y, (fmul u, v)), z) → (fma x, y (fma u, v, z))
fold (fadd x, (fma y, z, (fmul u, v)) → (fma y, z (fma u, v, x))

This kind of reassociation generates two FMA for (x^2 + y^2 + z).

Finally, specifically for the PPC target, we could ignore FP_EXTEND in the patterns above as it will be removed by the Machine Common Subexpression Elimination pass. For instance:

fold (fadd (fpext (fmul x, y)), z) → (fma x, y, z)
fold (fadd (fpext (fma x, y, (fmul u, v))), z) → (fma x, y (fma u, v, z))

Thanks for your help.
Olivier

From: "Olivier H Sallenave" <ohsallen@us.ibm.com>
To: llvmdev@cs.uiuc.edu
Sent: Monday, September 29, 2014 3:34:51 PM
Subject: [LLVMdev] More FMA folding opportunities

Hi,

I think more opportunities might be added for FMA in the DAG
combiner, please tell me what you think. Right now, those cases are
implemented:

fold (fadd (fmul x, y), z) -> (fma x, y, z)
fold (fadd x, (fmul y, z)) -> (fma y, z, x)

When the TLI callback "enableAggressiveFMAFusion" returns true, we
might also support:

fold (fadd (fma x, y, (fmul u, v)), z) -> (fma x, y (fma u, v, z))
fold (fadd x, (fma y, z, (fmul u, v)) -> (fma y, z (fma u, v, x))

This kind of reassociation generates two FMA for (x^2 + y^2 + z).

Finally, specifically for the PPC target, we could ignore FP_EXTEND
in the patterns above as it will be removed by the Machine Common
Subexpression Elimination pass. For instance:

fold (fadd (fpext (fmul x, y)), z) -> (fma x, y, z)
fold (fadd (fpext (fma x, y, (fmul u, v))), z) -> (fma x, y (fma u,
v, z))
...

Yes, this all sounds reasonable.

Thanks again,
Hal