error: couldn't allocate input reg for constraint '{xmm0}'

Here is some zig code:

pub fn setXmm0(comptime T: type, value: T) void {
    comptime assert(builtin.arch == builtin.Arch.x86_64);
    const aligned_value: T align(16) = value;
    asm volatile (
        \\movaps (%[ptr]), %%xmm0
        : [ptr] "r" (&aligned_value)
        : "xmm0"

I want to improve this and integrate more tightly with LLVM IR, like this:

    asm volatile (""
        : [value] "{xmm0}" (value)

Here, this communicates to llvm to make sure xmm0 is set to value, in
whatever way it needs to. Here is the LLVM IR:

  call void asm sideeffect "", "{xmm0}"(i128 %1)

But LLVM gives me this error:
error: couldn't allocate input reg for constraint '{xmm0}'

Is this a bug in LLVM or some fundamental limitation?

rkruppe on IRC suggested to try passing <4 x i32> rather than i128,
and that worked. I edited the IR module by hand like this:

  %V = bitcast i128 %1 to <4 x i32>
  call void asm sideeffect "", "{xmm0}"(<4 x i32> %V), !dbg !60

This produced the following assembly:

0000000000000030 <setXmm0>:
  30: 55 push %rbp
  31: 48 89 e5 mov %rsp,%rbp
  34: 48 89 7d f0 mov %rdi,-0x10(%rbp)
  38: 48 89 75 f8 mov %rsi,-0x8(%rbp)
  3c: 0f 10 45 f0 movups -0x10(%rbp),%xmm0
  40: 5d pop %rbp
  41: c3 retq

I think that's good! My only concern is whether LLVM respected the
alignment requirement of movups instruction. I suppose the calling
convention requires %rbp to be aligned to 16 already, and so
-0x10(%rbp) will be also guaranteed to be 16 bytes aligned?


Yes, the calling convention guarantees a fixed stack alignment on