Trying to use LLVM IR with inline asm and `vmovdqu32`, unsuccessully so far

nicolasvasilache · January 23, 2022, 3:07pm

Hi everyone,

Not sure if there is a better category for this type of Q but here goes.

I’ve been trying to mix LLVM and inline assembly recently; things usually work well when I manipulate vectors.

However I have been unable to get even a simple load operation to work as I expect.
I have isolated this minimal example that I run with: lli -O0 -mcpu=skylake-avx512 --entry-function=entry foo.ll

; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"

@pct_i_newline = private global [4 x i8] c"%i\0A\00"
@const16 = private global [16 x i32] [i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422]

declare i8* @malloc(i64)

declare void @free(i8*)

declare void @printf(i8*, ...)

define <16 x i32> @function_to_run() {
  %1 = call <16 x i32> asm inteldialect "vmovdqu32 $0, $1", "=x,m"(i32* getelementptr inbounds ([16 x i32], [16 x i32]* @const16, i64 0, i64 0))
  ret <16 x i32> %1
}

define void @entry() {
  %1 = call <16 x i32> @function_to_run()
  %2 = extractelement <16 x i32> %1, i64 9
  call void (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @pct_i_newline, i64 0, i64 0), i32 %2)
  ret void
}

While I was expecting the function to print 4422, it essentially prints a garbage integer that changes at each call.

I tried to get lli to dump the assembly with a mix of --filetype=asm --asm-show-inst -o ... but was unsuccessful.

Instead I use opt and llc as such: opt -O0 -mtriple=x86_64-linux-gnu -march=x86-64 -mcpu=skylake-avx512 /tmp/foo.ll | llc -O0.

I obtain the following assembly, which looks legit to me:

        .text
        .file   "LLVMDialectModule"
        .globl  function_to_run                 # -- Begin function function_to_run
        .p2align        4, 0x90
        .type   function_to_run,@function
function_to_run:                        # @function_to_run
        .cfi_startproc
# %bb.0:
        movq    $.Lconst16, -8(%rsp)
        #APP

        vmovdqu32       -8(%rsp), %zmm0

        #NO_APP
        retq
.Lfunc_end0:
        .size   function_to_run, .Lfunc_end0-function_to_run
        .cfi_endproc
                                        # -- End function
        .globl  entry                           # -- Begin function entry
        .p2align        4, 0x90
        .type   entry,@function
entry:                                  # @entry
        .cfi_startproc
# %bb.0:
        pushq   %rax
        .cfi_def_cfa_offset 16
        callq   function_to_run@PLT
        vextracti32x4   $2, %zmm0, %xmm0
        vpextrd $1, %xmm0, %esi
        movabsq $.Lpct_i_newline, %rdi
        movb    $0, %al
        vzeroupper
        callq   printf@PLT
        popq    %rax
        .cfi_def_cfa_offset 8
        retq
.Lfunc_end1:
        .size   entry, .Lfunc_end1-entry
        .cfi_endproc
                                        # -- End function
        .type   .Lpct_i_newline,@object         # @pct_i_newline
        .data
.Lpct_i_newline:
        .asciz  "%i\n"
        .size   .Lpct_i_newline, 4

        .type   .Lconst16,@object               # @const16
        .p2align        4
.Lconst16:
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .size   .Lconst16, 64

        .section        ".note.GNU-stack","",@progbits

I also run a different toolchain based on MLIR with which I am able to dump the .o and disassemble it; I see similar behavior and similar asm.

What am I missing ?

Thanks in advance!

d0k · January 23, 2022, 4:18pm

Inline asm constraints are hard. “m” is the value of the address so in your example you end up with the address of the constant in zmm0 instead of the value.

To actually load from it you want you can use an indirect constraint

call <16 x i32> asm inteldialect "vmovdqu32 $0, $1", "=v,*m"(<16 x i32>* elementtype(<16 x i32>) bitcast ([16 x i32]* @const16 to <16 x i32>*))

nicolasvasilache · January 23, 2022, 4:47pm

Thanks much Ben!

nicolasvasilache · January 23, 2022, 5:01pm

Followup, what’s the LLVM incantation to add attributes to operands?
I am piping that elementype through MLIR’s InlineAsmOp but can’t seem to find where to stick that when translating to LLVMIR.

nicolasvasilache · January 23, 2022, 8:12pm

ok got it working now, thanks again

Topic		Replies	Views
ASM appears to be incorrect from llc LLVM Dev List Archives	5	75	February 16, 2012
trying to generate a simple inline asm LLVM Dev List Archives	3	108	April 21, 2013
llvm, new language and inline assembly. LLVM Dev List Archives	2	112	November 15, 2015
How to execute AVX-512 code in LLVM with inline assemble? Beginners llvm	2	165	December 21, 2023
Trouble with inline asm LLVM Dev List Archives	3	92	June 7, 2008

Trying to use LLVM IR with inline asm and `vmovdqu32`, unsuccessully so far

Related topics