Hello,
Starting from an MLIR program I try to output AVX2/FMA vectorial code
working on the 256-bit registers %ymm
.
For a reason I don’t understand I can only get assembly code that uses
AVX instructions with no FMA on 128-bit %xmm
registers.
The things seem to go astray during the call to llc
. Its input contains
the following call to fmuladd
:
%44 = call <8 x float> @llvm.fmuladd.v8f32(<8 x float> %37, <8 x float> %40, <8 x float> %43), !dbg !35
However, the output contains instead the following code working on 128-bit registers:
movaps (%rsi,%rdx), %xmm0
movaps 16(%rsi,%rdx), %xmm1
mulps 16(%rax,%rdx), %xmm1
addps 16(%rcx,%rdx), %xmm1
mulps (%rax,%rdx), %xmm0
addps (%rcx,%rdx), %xmm0
movaps %xmm0, (%rcx,%rdx)
movaps %xmm1, 16(%rcx,%rdx)
The command line I use is:
~/llvm/bin/mlir-opt --lower-affine --convert-scf-to-std --convert-std-to-llvm --convert-vector-to-llvm try.mlir | ~/llvm/bin/mlir-translate --mlir-to-llvmir | ~/llvm/bin/llc -O1 --fp-contract=fast
Should you want to replicate the behavior, here is the MLIR source code:
func @test(%a:memref<128xvector<8xf32>>,
%b:memref<128xvector<8xf32>>,
%c:memref<128xvector<8xf32>>) {
affine.for %idx = 0 to 128 {
%av = memref.load %a[%idx] : memref<128xvector<8xf32>>
%bv = memref.load %b[%idx] : memref<128xvector<8xf32>>
%cv = memref.load %c[%idx] : memref<128xvector<8xf32>>
%xv = vector.fma %av, %bv, %cv : vector<8xf32>
memref.store %xv, %c[%idx] : memref<128xvector<8xf32>>
}
return
}
Best,
Dumitru
PS: found the solution: add -mattr=avx2 -mattr=fma
to llc
.