vmlx forwarding an cortex A9 question

Hi all,

On following code when I use llc targeting ARM Cortex-A9 as follows, if vmlx-forwarding is turned off then 'vmla' instructions are generated. It seems that -mcpu=cortex-a9 enables it by default and thus less 'vmla' instructions are generated. On this specific example it doesn't make any difference in term of performance, but on a more complex example disabling vmlx-forwarding leads to significant win in performance wise (10% speed-up). Does anyone can explain me what is the exact purpose of vmlx-forwarding option, and why it is enabled by default on cortex-A9.

On attached file:
llc fun2.ll -march=arm -mcpu=cortex-a9 -mattr=+neon -float-abi=soft -o fun2.s => vmla instruction not generated
On attached file:
llc fun2.ll -march=arm -mcpu=cortex-a9 -mattr=+neon,-vmlx-forwarding -float-abi=soft -o fun2.s => vmla instruction generated

Thanks for your answers.
Best Regards
Seb

fun2.ll (2.67 KB)

I tried llc fun2.ll -march=arm -mcpu=cortex-a9 -mattr=+neon -float-abi=soft -o fun2.s and I see vmla is generated:
  vmla.f32 q13, q13, q13

Besides, I don't think HasVMLxForwarding() is being used now.

Thanks,
Weiming