RFC: Adding Support For Vectorcall Calling Convention

Adding Support For Vectorcall Calling Convention

Don’t we already implement this correctly on Windows?

I agree Clang should do the HVA classification. LLVM just doesn’t have the information. Right now, Clang splits HVAs passed in registers and passes other structs or HVAs that don’t fit in the available vector registers with byval.

Traditionally, Clang has tried very hard to split aggregates passed by value to make LLVM’s job easier. Your proposal undoes a lot of that, but that seems to be the direction we’re going today. See ARM and AArch64, which pass HVAs as arrays.

I think either your suggestion of the array suggestion are improvements over the current situation. One problem with passing the LLVM struct type directly and marking it inreg is that it might be hard for the backend to figure out what the HVA element type is. The array convention solves this because the element type is obvious.

Thanks Reid for your inputs (and code reviews BTW).

The current Vectorcall implementation is incomplete for x64 and x32.

Some of the issues in the current implementation are:

  • It doesn’t take into account the original arguments’ position (before HVA expansion)

  • It doesn’t allocate the HVAs in lower priority (compared to vector types and integer types)

  • It doesn’t allocate shadow register in case a vector type is assigned

  • It doesn’t allocate shadow stack for the vector types

Whether it is a structure or an array, they both get to the same function in codegen: ComputeValueVTs

In the function, elements are being extracted in similar recursive way, for both structures and arrays.

So I really don’t see much of a difference between the two approaches.

Thanks again,

Oren

Looks like I didn’t understand the convention very well in 2014. :frowning:

Oh well. It’s actually surprisingly complicated. The convention seems constrained by a desire to have the /homeparams option work well for at least all non-vector, non-HVA parameters. MSVC’s generated code for this example with /homeparams clears things up for me:

double gd1, gd2, gd3, gd4;
__int64 gi1, gi2;
void __vectorcall g(double xmm0, __int64 rdx, double xmm2, __int64 r9, double xmm4, double xmm5) {
gd1 = xmm0;
gd2 = xmm2;
gi1 = rdx;
gi2 = r9;
gd3 = xmm4;
gd4 = xmm5;
printf(“asdf\n”);
gi2 = 0;
}

All the parameters are laid out contiguously, presumably for debugging or tracing purposes. So, with all that in mind, I think I now understand the need to distinguish HVAs from standalone vector and floating point arguments.

I think your design is the way to go. It’s consistent with what we’ve done for ARM and where we probably want to go in the future. Splitting structures in the frontend has helped us generate better code in the past, but we need to overcome our limitations around extractvalue/insertvalue going forward anyway.