Solved: How to make LLC use AVX2 and FMA?


Starting from an MLIR program I try to output AVX2/FMA vectorial code
working on the 256-bit registers %ymm.

For a reason I don’t understand I can only get assembly code that uses
AVX instructions with no FMA on 128-bit %xmm registers.

The things seem to go astray during the call to llc. Its input contains
the following call to fmuladd:

%44 = call <8 x float> @llvm.fmuladd.v8f32(<8 x float> %37, <8 x float> %40, <8 x float> %43), !dbg !35

However, the output contains instead the following code working on 128-bit registers:

movaps (%rsi,%rdx), %xmm0
movaps 16(%rsi,%rdx), %xmm1
mulps 16(%rax,%rdx), %xmm1
addps 16(%rcx,%rdx), %xmm1
mulps (%rax,%rdx), %xmm0
addps (%rcx,%rdx), %xmm0
movaps %xmm0, (%rcx,%rdx)
movaps %xmm1, 16(%rcx,%rdx)

The command line I use is:

~/llvm/bin/mlir-opt --lower-affine --convert-scf-to-std --convert-std-to-llvm --convert-vector-to-llvm try.mlir | ~/llvm/bin/mlir-translate --mlir-to-llvmir | ~/llvm/bin/llc -O1 --fp-contract=fast

Should you want to replicate the behavior, here is the MLIR source code:

func @test(%a:memref<128xvector<8xf32>>,
           %c:memref<128xvector<8xf32>>) {
  affine.for %idx = 0 to 128 {
    %av = memref.load %a[%idx] : memref<128xvector<8xf32>>
    %bv = memref.load %b[%idx] : memref<128xvector<8xf32>>
    %cv = memref.load %c[%idx] : memref<128xvector<8xf32>>
    %xv  = vector.fma %av, %bv, %cv : vector<8xf32> %xv, %c[%idx] : memref<128xvector<8xf32>>


PS: found the solution: add -mattr=avx2 -mattr=fma to llc.

This can be a problem with the target / sub-target options in the LLVM IR generated: the backend does not know that AVX2 is legal here.
You should be able to override this with llc by adding: -mattr=+avx2

1 Like

To see how clang passes this information to LLVM, try to see the difference between:
echo "int foo() { return 0;}" | clang -x c++ - -o - -S -emit-llvm -O2
echo "int foo() { return 0;}" | clang -x c++ - -o - -S -emit-llvm -O2 -march=native

On my machine the function attribute goes from: