Folding memory into FMA

For the test case given in the link

I see two differences comparing with GCC generated code.

  1. LLVM is not folding memory into FMA. is there a way to force folding ? can some one give pointers on where in instruction selection I need to look?

  2. LLVM seems to be not generating “vfnmadd213pd” although user has asked for that intrinsic. The reason seems to be during in-lining “b - (constant * a )” is converted to “b + (- constant) *a”. is there way to generate vfnmadd213pd ?

If I change the LLVM IR to “b + (-a) * constant” it seem we generate “vfnmadd213pd” but the folding of memory into the FMA is not happening.