[AArch64] Is the cost of MSUB instruction is significantly higher than that of the MADD instruction?

Allen · June 4, 2022, 9:55am

Base on ⚙ D40306 [AArch64] Add patterns to replace fsub fmul with fma fneg., we transforming (fsub (fmul x y) z) into (fma x y (fneg z)), instead of using the fmls directive.

So Is the cost of MSUB instruction is significantly higher than that of the MADD instruction in AArch64 target ?

releted code in MachineCombiner pass

  case MachineCombinerPattern::FMLSv4f32_OP1:
  case MachineCombinerPattern::FMLSv4i32_indexed_OP1: {
    RC = &AArch64::FPR128RegClass;
    Register NewVR = MRI.createVirtualRegister(RC);
    MachineInstrBuilder MIB1 =
        BuildMI(MF, Root.getDebugLoc(), TII->get(AArch64::FNEGv4f32), NewVR)
            .add(Root.getOperand(2));
    InsInstrs.push_back(MIB1);
    InstrIdxForVirtReg.insert(std::make_pair(NewVR, 0));
    if (Pattern == MachineCombinerPattern::FMLSv4i32_indexed_OP1) {
      Opc = AArch64::FMLAv4i32_indexed;
      MUL = genFusedMultiply(MF, MRI, TII, Root, InsInstrs, 1, Opc, RC,
                             FMAInstKind::Indexed, &NewVR);
    } else {
      Opc = AArch64::FMLAv4f32;
      MUL = genFusedMultiply(MF, MRI, TII, Root, InsInstrs, 1, Opc, RC,
                             FMAInstKind::Accumulator, &NewVR);
    }
    break;
  }

efriedma-quic · June 6, 2022, 5:34pm

From the review you linked: “This has a lower latency on micro architectures where fneg is cheap.” The assumption is that fmul has the same cost as fmla, but fneg is cheaper than fsub.

efriedma-quic · June 6, 2022, 5:45pm

Oh, wait, I think I see your question. I don’t think fmls has the right semantics for this transform you’re thinking of.

Allen · June 10, 2022, 12:02pm

Am I missing something ?

FMLS (vector) **fmls x,y, z = x*y - z**

Floating-point fused Multiply-Subtract from accumulator (vector).

FMLA (by element) **fmla x,y, z = x*y + z**

Floating-point fused Multiply-Add to accumulator (by element).

efriedma-quic · June 10, 2022, 8:01pm

fmls computes z-(x*y).

Allen · June 11, 2022, 3:59am

Thanks for your patient reply

Topic		Replies	Views
AArch64 fmul/fadd fusion LLVM Dev List Archives	5	98	September 21, 2015
Fusing contract fadd/fsub with normal fmul LLVM Dev List Archives	3	107	June 12, 2017
RISC-V multiply-add instructions (FMADD.*, etc), bit-exactness, and correctness of optimizations RISCV	1	1607	August 1, 2022
Help Needed: Pattern Matching Across Basic Blocks in AArch64 LLVM AArch64	3	60	July 31, 2024
Codegen document question Common Infrastructure	2	249	November 28, 2022

[AArch64] Is the cost of MSUB instruction is significantly higher than that of the MADD instruction?

FMLS (vector) fmls x,y, z = x*y - z

FMLA (by element) fmla x,y, z = x*y + z

Related Topics

FMLS (vector) **fmls x,y, z = x*y - z**

FMLA (by element) **fmla x,y, z = x*y + z**