k registers in extended asm


I am getting an abort in llc with ‘main’ and LLVM 13.0 when I run clang at -O0 for the following test case:

#include <immintrin.h>
void small(float *b, __m512 c, __mmask16 mask)
__m512 a = _mm512_setzero_ps();
asm(“vfmadd231ps (%1), %2, %0 %{%3%}”: “+v”(a): “r”(b), “v”(c), “k” (mask));
_mm512_mask_storeu_ps(b, mask, a);

clang -S bug.c -march=skylake-avx512 -O0

bug.c:6:7: error: Register k0 can’t be used as write mask
asm(“vfmadd231ps (%1), %2, %0 %{%3%}”: “+v”(a): “r”(b), “v”(c), “k” (mask));
:1:36: note: instantiated into assembly here
vfmadd231ps (%rax), %zmm1, %zmm0 {%k0}

Works: clang -S bug.c -march=skylake-avx512 -O2 // chooses k1

The first observation I have is that there is no write of k0 on the fma instruction. So even if I wrote my test case to explicitly use k0, there is still the abort at -O0. This feels like a bug.

The second thing I have – is it required that I specify the mask register I want since k0 is “special”? Or is llc incorrectly choosing the k0 register in the first place?



Try using “Yk” instead of “k”. That should pick from k1-k7.