Trying to use LLVM IR with inline asm and `vmovdqu32`, unsuccessully so far

Hi everyone,

Not sure if there is a better category for this type of Q but here goes.

I’ve been trying to mix LLVM and inline assembly recently; things usually work well when I manipulate vectors.

However I have been unable to get even a simple load operation to work as I expect.
I have isolated this minimal example that I run with: lli -O0 -mcpu=skylake-avx512 --entry-function=entry foo.ll

; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"

@pct_i_newline = private global [4 x i8] c"%i\0A\00"
@const16 = private global [16 x i32] [i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422, i32 4422]

declare i8* @malloc(i64)

declare void @free(i8*)

declare void @printf(i8*, ...)

define <16 x i32> @function_to_run() {
  %1 = call <16 x i32> asm inteldialect "vmovdqu32 $0, $1", "=x,m"(i32* getelementptr inbounds ([16 x i32], [16 x i32]* @const16, i64 0, i64 0))
  ret <16 x i32> %1
}

define void @entry() {
  %1 = call <16 x i32> @function_to_run()
  %2 = extractelement <16 x i32> %1, i64 9
  call void (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @pct_i_newline, i64 0, i64 0), i32 %2)
  ret void
}

While I was expecting the function to print 4422, it essentially prints a garbage integer that changes at each call.

I tried to get lli to dump the assembly with a mix of --filetype=asm --asm-show-inst -o ... but was unsuccessful.

Instead I use opt and llc as such: opt -O0 -mtriple=x86_64-linux-gnu -march=x86-64 -mcpu=skylake-avx512 /tmp/foo.ll | llc -O0.

I obtain the following assembly, which looks legit to me:

        .text
        .file   "LLVMDialectModule"
        .globl  function_to_run                 # -- Begin function function_to_run
        .p2align        4, 0x90
        .type   function_to_run,@function
function_to_run:                        # @function_to_run
        .cfi_startproc
# %bb.0:
        movq    $.Lconst16, -8(%rsp)
        #APP

        vmovdqu32       -8(%rsp), %zmm0

        #NO_APP
        retq
.Lfunc_end0:
        .size   function_to_run, .Lfunc_end0-function_to_run
        .cfi_endproc
                                        # -- End function
        .globl  entry                           # -- Begin function entry
        .p2align        4, 0x90
        .type   entry,@function
entry:                                  # @entry
        .cfi_startproc
# %bb.0:
        pushq   %rax
        .cfi_def_cfa_offset 16
        callq   function_to_run@PLT
        vextracti32x4   $2, %zmm0, %xmm0
        vpextrd $1, %xmm0, %esi
        movabsq $.Lpct_i_newline, %rdi
        movb    $0, %al
        vzeroupper
        callq   printf@PLT
        popq    %rax
        .cfi_def_cfa_offset 8
        retq
.Lfunc_end1:
        .size   entry, .Lfunc_end1-entry
        .cfi_endproc
                                        # -- End function
        .type   .Lpct_i_newline,@object         # @pct_i_newline
        .data
.Lpct_i_newline:
        .asciz  "%i\n"
        .size   .Lpct_i_newline, 4

        .type   .Lconst16,@object               # @const16
        .p2align        4
.Lconst16:
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .long   4422                            # 0x1146
        .size   .Lconst16, 64

        .section        ".note.GNU-stack","",@progbits

I also run a different toolchain based on MLIR with which I am able to dump the .o and disassemble it; I see similar behavior and similar asm.

What am I missing ?

Thanks in advance!

Inline asm constraints are hard. “m” is the value of the address so in your example you end up with the address of the constant in zmm0 instead of the value.

To actually load from it you want you can use an indirect constraint

call <16 x i32> asm inteldialect "vmovdqu32 $0, $1", "=v,*m"(<16 x i32>* elementtype(<16 x i32>) bitcast ([16 x i32]* @const16 to <16 x i32>*))
1 Like

Thanks much Ben!

Followup, what’s the LLVM incantation to add attributes to operands?
I am piping that elementype through MLIR’s InlineAsmOp but can’t seem to find where to stick that when translating to LLVMIR.

ok got it working now, thanks again