I am tracking down an x86-64 code generation problem that has to do with AVX instructions. The symptom is: a function is called, and the upper half of the function argument (which is short16) is zero. This happens only when I compile code with pocl, but not when I use clang and/or llc manually.
I tracked this down to the following. The call site looks like
vmovdqa 24064(%rsp), %ymm0
vmovdqa %ymm0, (%rsp)
which passes the argument on the stack. The callee, however, begins with
__Z14convert_char16Dv16_s: ## @_Z14convert_char16Dv16_s
## BB#0: ## %entry
.cfi_offset %rbp, -16
movq %rsp, %rbp
vextractf128 $1, %ymm0, %xmm1
which expects the argument in %ymm0. However, the vzeroupper in the caller just destroyed part of %ymm0...
My question is:
What decides this calling convention? I know that standard x86-64 should pass arguments in %xmm0, not %ymm0. Are there e.g. command line options, CPU attributes, or target triplets that would modify this? Or should this be filed as bug report? However, this may also be a bug in pocl as I haven't been able to reproduced this without pocl.