fmac generation for cortex-a9

Hi all,

I’ve a .ll code that use double precision fmul/fadd or fmul/fsub. When I compile it using llc –mcpu=cortex-a9 I couldn’t get vmla/vmls generated even using –fp-contract=fast, but when I use option –mtriple=armv7-eabi instead of –mcpu=cortex-a9 fused mac are generated. Can someone explain me why ?

Thanks for your answers

Seb

Perhaps you need to use some attributes. -mattr=+vfp4
Check fusedMAC.ll from ARM codegen tests.

Hi Anitha,

Thanks for your answer but -mcpu=cortex-a9 -mattr=+vfp4 doesn' t enable fused mac generation for me.
I would like just to understand why -mtriple=armv7-eabi enables it while -mcpu=cortex-a9 seems to disable it ?

Seb

AFAIK A9 doesn’t have VFPv4 or AdvSIMDv2, so it doesn’t have VFMA. I don’t know what LLVM does, but it shouldn’t emit VFMA when you target A9. VMLA isn’t a fused multiply-add, it’s a multiply followed by an add and has different latency as well as precision.

Hi Bastien,

Weird gcc is generating fma for my platform STEricsson Novathor with Linaro, code works. It also works when I use LLVM to generate fma (using llc –mtriple=armv7-eabi). Maybe someone from ARM can answer the question ?

Seb

Hi Sebastien,

ARMv7-M has VFMA and LLVM's "triple" is far from perfect.

Wikipedia tells me NovaThor can also be A15, or STE could have cramped
a VFPv4 in it? :wink: Or possibly, your code never branches into the VFMA.
Many things could be happening, but usually, VFMA shouldn't be
generated for A9.

A GCC bug, maybe?

Hi Renato,

It's definitively not A15. Can this be the case that NEON units for cortex-A9 support it but isn't documented/recommended ?
And as mentioned before code is working !

Seb

cat /proc/cpuinfo ?

Are you sure it’s generating VFMA and not VMLA?

Hi Renato,

You're right it's VMLA/VMLS that are generated. Still don't understand what drives generation for Cortex-A9.
I was using fmac for floating point MAC not for fused MAC. Than I realized that we spoke about fma instead of fmac.
So back to the original problem why when using -mcpu=cortex-a9 VMLA/VMLS are not generated and when I use -mtriple=armv7-eabi they are ?

Best Regards
Seb

Oh, right! Now it makes more sense...

-mcpu support is flaky, you should only use in conjunction with
--triple (or whatever people change to these days).

The main reason is that there isn't (yet) a good relationship between
cpus/fpus and architectures, so you need to provide both.

Try to dump the IR with just the -mcpu option and check what triple
it's printing.

cheers,
--renato