avx512 JIT backend generates wrong code on <4 x float>

Hi!

When compiling the attached module with the JIT engine on an Intel KNL I see wrong code getting emitted. I attach a complete exploit program which shows the bug in LLVM 3.8. It loads and JIT compiles the module and prints the assembler. I stumbled on this since the result of an actual calculation was wrong. So, it's not only the text version of the assembler also the machine assembler is wrong.

When I execute the exploit program on an Intel KNL the following output is produced:

CPU name = knl
-sse4a,-avx512bw,cx16,-tbm,xsave,-fma4,-avx512vl,prfchw,bmi2,adx,-xsavec,fsgsbase,avx,avx512cd,avx512pf,-rtm,popcnt,fma,bmi,aes,rdrnd,-xsaves,sse4.1,sse4.2,avx2,avx512er,sse,lzcnt,pclmul,avx512f,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq,
Assembly:
     .text
     .file "module_KFxOBX_i4_after.ll"
     .globl adjmul
     .align 16, 0x90
     .type adjmul,@function
adjmul:
     .cfi_startproc
     leaq (%rdi,%r8), %rdx
     addq %rsi, %r8
     testb $1, %cl
     cmoveq %rdi, %rdx
     cmoveq %rsi, %r8
     movq %rdx, %rax
     sarq $63, %rax
     shrq $62, %rax
     addq %rdx, %rax
     sarq $2, %rax
     movq %r8, %rcx
     sarq $63, %rcx
     shrq $62, %rcx
     addq %r8, %rcx
     sarq $2, %rcx
     movq %rax, %rdx
     shlq $5, %rdx
     leaq 16(%r9,%rdx), %rsi
     orq $16, %rdx
     movq 16(%rsp), %rdi
     addq %rdx, %rdi
     addq 8(%rsp), %rdx
     .align 16, 0x90
.LBB0_1:
     vmovaps -16(%rdx), %xmm0
     vmovaps (%rdx), %xmm1
     vmovaps -16(%rdi), %xmm2
     vmovaps (%rdi), %xmm3
     vmulps %xmm3, %xmm1, %xmm4
     vmulps %xmm2, %xmm1, %xmm1
     vfmadd213ss %xmm4, %xmm0, %xmm2
     vfmsub213ss %xmm1, %xmm0, %xmm3
     vmovaps %xmm2, -16(%rsi)
     vmovaps %xmm3, (%rsi)
     addq $1, %rax
     addq $32, %rsi
     addq $32, %rdi
     addq $32, %rdx
     cmpq %rcx, %rax
     jl .LBB0_1
     retq
.Lfunc_end0:
     .size adjmul, .Lfunc_end0-adjmul
     .cfi_endproc

     .section ".note.GNU-stack","",@progbits

end assembly!

The instructions 'vfmadd213ss' are 'Fused Multiply-Add of Scalar Single-Precision Floating-Point'. Those should be SIMD vector instructions. Note that the KNL has 16 wide float SIMD, while the exploit module uses only 4. However, the backend should be able to handle this.

Unless I receive further ideas I will file an official bug report.

Frank

Makefile (456 Bytes)

module_KFxOBX_i4_after.ll (1.91 KB)

main.cc (4.94 KB)

Hi Frank,

I recommend trying trunk LLVM. AVX-512 development has been very active recently.

-Hal

Hi Hal!

Thanks, but unfortunately it didn't help. The exact same assembler instructions are generated for both 3.8 (yesterday) and trunk (from today).

So, this really looks like a bug.

Best,
Frank

From: "Frank Winter" <fwinter@jlab.org>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "LLVM Dev" <llvm-dev@lists.llvm.org>
Sent: Thursday, June 30, 2016 11:49:34 AM
Subject: Re: [llvm-dev] avx512 JIT backend generates wrong code on <4 x float>

Hi Hal!

Thanks, but unfortunately it didn't help. The exact same assembler
instructions are generated for both 3.8 (yesterday) and trunk (from
today).

So, this really looks like a bug.

Okay. Please file a bug report.

-Hal