Perusing the Clang source, I found something rather confusing:
if ((IsVectorCall || IsRegCall) &&
isHomogeneousAggregate(Ty, Base, NumElts)) {
if (IsRegCall) {
if (FreeSSERegs >= NumElts) {
FreeSSERegs -= NumElts;
if (IsReturnType || Ty->isBuiltinType() || Ty->isVectorType())
return ABIArgInfo::getDirect();
return ABIArgInfo::getExpand();
}
return ABIArgInfo::getIndirect(Align, /ByVal=/false);
} else if (IsVectorCall) {
if (FreeSSERegs >= NumElts &&
(IsReturnType || Ty->isBuiltinType() || Ty->isVectorType())) {
FreeSSERegs -= NumElts;
return ABIArgInfo::getDirect();
} else if (IsReturnType) {
return ABIArgInfo::getExpand();
} else if (!Ty->isBuiltinType() && !Ty->isVectorType()) {
// HVAs are delayed and reclassified in the 2nd step.
return ABIArgInfo::getIndirect(Align, /ByVal=/false);
}
}
}
If we look at ”isReturnType” for IsVectorCall = true has ”ABIArgInfo::getExpand()” however, ”expand” is not a valid type of ABIArgInfo and will throw an error.
So this seems to be incorrect and should crash on vectorcall with HVA. Can someone confirm?
That was long enough ago that I don't really remember. At the time, I wrote tests to validate the behaviors I think (which would mean it didn't crash?), but I could buy that I did something wrong back then. Do we have an idea what the return-type ABIArgInfo should be? I'm sorry I cannot be more helpful here.
As far as I can tell it should simply do an indirect here:
Results of __vectorcall functions are returned by value in registers when possible. Results of integer type, including structs or unions of 4 bytes or less, are returned by value in EAX. Integer type structs or unions of 8 bytes or less are returned by value in EDX:EAX. Vector type results are returned by value in XMM0 or YMM0, depending on size. HVA results have each data element returned by value in registers XMM0:XMM3 or YMM0:YMM3, depending on element size. Other result types are returned by reference to memory allocated by the caller.
HVA results have each data element returned by value in registers XMM0:XMM3 or YMM0:YMM3, depending on element size. Other result types are returned by reference to memory allocated by the caller.
Mean we should be doing 'expand' if the type fits in 4 registers?
”Expand” is never valid for returns, as it simply splits an aggregate over multiple parameters. If it fits in 4 registers I assume it is a ”direct” as it is written.
I don’t particularly get the ‘expand is never valid for returns’, as I’m sure I’ve seen it before, but I’m also sure I don’t know the ABI code well enough to speak with authority.
There might be some other goofiness in that code as well. I wouldn’t expect the return-value to decrease the available number of registers, since returns re-use registers.
Either way, you’re likely right that that this code needs work. Patches welcome
Expand assumes parameters, since it basically takes the struct and drops each part of the struct into a different parameter (e.g. void foo(struct { int , int }) -> void foo(int, int) ). Which cannot work because you can at most have a single value return value in the function prototype. So the ”expand” is not even implemented for return values in the code.