ORC JIT error when using AVX2 vector instructions

Hi.

As soon as the module contains instructions operating on < 8 x float > the ORC JIT refuses to work.

Here’s the module that provokes the error given further below:

; ModuleID = ‘module’
source_filename = “module”
target datalayout = “e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128”

define private void @eval0_intern(i32 %arg0, i32 %arg1, <8 x float>* %arg2, <8 x float>* %arg3, <8 x float>* %arg4) {
stack:
br label %afterstack

afterstack: ; preds = %stack
%0 = add nsw i32 %arg0, %arg1
%1 = add nsw i32 0, %0
%2 = mul i32 %1, 1
%3 = add nsw i32 %2, 0
%4 = mul i32 %3, 1
%5 = add nsw i32 %4, 0
%6 = mul i32 %5, 1
%7 = add nsw i32 %6, 0
%8 = mul i32 %7, 8
%9 = getelementptr <8 x float>, <8 x float>* %arg3, i32 %8
%10 = load <8 x float>, <8 x float>* %9, align 32
%11 = add nsw i32 0, %0
%12 = mul i32 %11, 1
%13 = add nsw i32 %12, 0
%14 = mul i32 %13, 1
%15 = add nsw i32 %14, 0
%16 = mul i32 %15, 1
%17 = add nsw i32 %16, 0
%18 = mul i32 %17, 8
%19 = getelementptr <8 x float>, <8 x float>* %arg4, i32 %18
%20 = load <8 x float>, <8 x float>* %19, align 32
%21 = mul <8 x float> %20, %10
%22 = add nsw i32 0, %0
%23 = mul i32 %22, 1
%24 = add nsw i32 %23, 0
%25 = mul i32 %24, 1
%26 = add nsw i32 %25, 0
%27 = mul i32 %26, 1
%28 = add nsw i32 %27, 0
%29 = mul i32 %28, 8
%30 = getelementptr <8 x float>, <8 x float>* %arg2, i32 %29
store <8 x float> %21, <8 x float>* %30, align 32
ret void
}

define void @eval0(i32 %idx, [8 x i8]* %arg_ptr) {
entrypoint:
%0 = getelementptr [8 x i8], [8 x i8]* %arg_ptr, i32 0
%1 = bitcast [8 x i8]* %0 to i32*
%2 = load i32, i32* %1, align 4
%3 = getelementptr [8 x i8], [8 x i8]* %arg_ptr, i32 1
%4 = bitcast [8 x i8]* %3 to <8 x float>**
%5 = load <8 x float>, <8 x float>** %4, align 8
%6 = getelementptr [8 x i8], [8 x i8]
%arg_ptr, i32 2
%7 = bitcast [8 x i8]* %6 to <8 x float>**
%8 = load <8 x float>, <8 x float>** %7, align 8
%9 = getelementptr [8 x i8], [8 x i8]
%arg_ptr, i32 3
%10 = bitcast [8 x i8]* %9 to <8 x float>**
%11 = load <8 x float>, <8 x float>** %10, align 8
call void @eval0_intern(i32 %idx, i32 %2, <8 x float>
%5, <8 x float>* %8, <8 x float>* %11)
ret void
}

This is an illegal instruction. mul is an integer operation, but that has floating point types. The correct operation would be fmul.

%21 = mul <8 x float> %20, %10

Thanks! Yeah, that was my silliness. Fixed and the module compiles now with ORC JIT Kaleidoscope.

However, looking at the assembler I only see SSE (128 bit vectors) being generated:

.Leval0_intern:
.cfi_startproc
addl %esi, %edi
shll $3, %edi
movslq %edi, %rax
shlq $5, %rax
movaps (%r8,%rax), %xmm0
movaps 16(%r8,%rax), %xmm1
mulps 16(%rcx,%rax), %xmm1
mulps (%rcx,%rax), %xmm0
movaps %xmm0, (%rdx,%rax)
movaps %xmm1, 16(%rdx,%rax)
retq

I cross checked what LLC gives:

Calling llc with no optional flags gives matching assembler, but when adding ‘-mattr=+avx2’ I get AVX2 (256 bit vectors)

.Leval0_intern: # @eval0_intern
.cfi_startproc

%bb.0: # %stack

addl %esi, %edi
shll $3, %edi
movslq %edi, %rax
shlq $5, %rax
vmovaps (%r8,%rax), %ymm0
vmulps (%rcx,%rax), %ymm0, %ymm0
vmovaps %ymm0, (%rdx,%rax)
vzeroupper
retq

That makes me think that the ORC JIT Kaleidoscope doesn’t use the ‘+avx2’ attribute.

How can ORC JIT Kaleidoscope generate jitted code with AVX2 instructions?

Thanks again & Best wishes,
Frank

Hi Frank

That makes me think that the ORC JIT Kaleidoscope doesn’t use the ‘+avx2’ attribute.

How can ORC JIT Kaleidoscope generate jitted code with AVX2 instructions?

Did you try adding something like:
JTMB.addFeatures({“+avx2”});

Here?
Hope it helps. Best, Stefan

Hi Stefan.

Thanks for the tip. But, it didn’t do the trick - still only SSE. (the /proc/cpuinfo contains flag ‘avx2’)

I instrumented a bit:

JITTargetMachineBuilder JTMB((*TPC)->getTargetTriple());

llvm::outs() << "feature string: " << JTMB.getFeatures().getString() << “\n”;

llvm::outs() << “adding features…\n”;
JTMB.addFeatures({“+avx2”});

llvm::outs() << "feature string: " << JTMB.getFeatures().getString() << “\n”;

Output:

Creating JIT
feature string:
adding features…
feature string: +avx2
Creating JIT successfu

But still only SSE:

.Leval0_intern:
.cfi_startproc
addl %esi, %edi
shll $3, %edi
movslq %edi, %rax
shlq $5, %rax
movaps (%r8,%rax), %xmm0
movaps 16(%r8,%rax), %xmm1
mulps 16(%rcx,%rax), %xmm1
mulps (%rcx,%rax), %xmm0
movaps %xmm0, (%rdx,%rax)
movaps %xmm1, 16(%rdx,%rax)
retq

Should I switch to LLVM 13 release or is avx2 in JIT a trusted feature to be present in version 12?

Best,
Frank

Should I switch to LLVM 13 release or is avx2 in JIT a trusted feature to be present in version 12?

I am not certain, but I’d assume yes. I used AVX in a JIT many years ago in an experimental project. I see no reason why AVX2 wouldn’t be available in ORC.

Did you check with the results of sys::getHostCPUFeatures()? There’s lots of AVX variants:
And maybe have a look how it works in JITTargetMachineBuilder:

Hi Frank, Stefan,

I am not certain, but I’d assume yes. I used AVX in a JIT many years ago in an experimental project. I see no reason why AVX2 wouldn’t be available in ORC.

I agree: ORC really only touches the compiler to set it up, so as long as the target machine is set up correctly this should “Just Work”.

It might be worth stepping through the call to createTargetMachine in https://github.com/llvm/llvm-project/blob/76a1a415302d06ceb4a3358493e897e98dd75f77/llvm/lib/ExecutionEngine/Orc/JITTargetMachineBuilder.cpp#L51 to see whether the CPU and AVX feature flags are being handled as expected.

– Lang.