For the test case given in the link
I see two differences comparing with GCC generated code.
-
LLVM is not folding memory into FMA. is there a way to force folding ? can some one give pointers on where in instruction selection I need to look?
-
LLVM seems to be not generating “vfnmadd213pd” although user has asked for that intrinsic. The reason seems to be during in-lining “b - (constant * a )” is converted to “b + (- constant) *a”. is there way to generate vfnmadd213pd ?
If I change the LLVM IR to “b + (-a) * constant” it seem we generate “vfnmadd213pd” but the folding of memory into the FMA is not happening.