AVX calling convention?

I am tracking down an x86-64 code generation problem that has to do with AVX instructions. The symptom is: a function is called, and the upper half of the function argument (which is short16) is zero. This happens only when I compile code with pocl, but not when I use clang and/or llc manually.

I tracked this down to the following. The call site looks like

  vmovdqa 24064(%rsp), %ymm0
  vmovdqa %ymm0, (%rsp)
  vzeroupper
  callq __Z14convert_char16Dv16_s

which passes the argument on the stack. The callee, however, begins with

__Z14convert_char16Dv16_s: ## @_Z14convert_char16Dv16_s
  .cfi_startproc
## BB#0: ## %entry
  pushq %rbp
Ltmp2:
  .cfi_def_cfa_offset 16
Ltmp3:
  .cfi_offset %rbp, -16
  movq %rsp, %rbp
Ltmp4:
  .cfi_def_cfa_register %rbp
  vextractf128 $1, %ymm0, %xmm1

which expects the argument in %ymm0. However, the vzeroupper in the caller just destroyed part of %ymm0...

My question is:

What decides this calling convention? I know that standard x86-64 should pass arguments in %xmm0, not %ymm0. Are there e.g. command line options, CPU attributes, or target triplets that would modify this? Or should this be filed as bug report? However, this may also be a bug in pocl as I haven't been able to reproduced this without pocl.

-erik

The calling convention should be clear from the LLVM IR. Make sure the
caller and callee use the same calling convention markings.

You might get strange results if one translation unit has AVX and/or AVX2
enabled, and the other has it disabled: the CPU features modify the calling
convention for AVX/AVX2 vectors.

-Eli