Recently, Reid Kleckner found an ABI incompatibility between clang and GCC in the way vector parameters are passed on 32-bit x86.
(This is documented in PR21510.)
Specifically, GCC uses XMM0-XMM2 to pass the first 3 __m128 parameters, and the rest are passed on the stack. Clang passes an additional parameter by register, using XMM0-XMM3. The same applies to __m256 with YMM0-2 vs. YMM0-3. In theory, it would apply to __m512 as well, but currently clang doesn’t support passing __m512 in x86 mode at all. ICC has the same behavior as GCC, and it seems that MSVC in 32-bit mode only allows up to 3 vector parameters per function (when not using __vectorcall), and these 3 are passed in XMM0-XMM2, which is closer to the GCC behavior.
Unfortunately, it seems like there is no ABI specification to support either behavior as “correct”: while the x32 (“ILP32”) ABI explicitly specifies XMM0-XMM2, the latest version of the i386 psABI is too old to contain any useful information.
Still, XMM0-XMM2 looks like the common choice, and I think the current clang behavior should be considered a bug.
The problem is that, regardless of whether it’s a bug or not, this behavior has been in place for many years, and changing it would mean breaking ABI compatibility with older clang versions.
On the other hand, not changing it would mean continued ABI incompatibility with GCC.
(This only applies to _m128 and _m256. Making the _m512 behavior GCC-compatible should be painless).
Reid (and I hope I’m not misrepresenting him here) suggested leaving the behavior as-is on platforms where clang is the system compiler (Darwin and BSD) and changing it elsewhere. However, I’m afraid interpreting the calling convention differently (compatible / incompatible with GCC) on different platforms may be confusing to end-uses.
Any thoughts on this, especially from OS/libraries people, will be very appreciated.