BUG: complete misunterstanding of the MS-ABI

Objects compiled for the MS-ABI don't conform to it!

Data types beyond 64 bit MUST BE returned by the callee via the
hidden first argument allocated by the caller, NOT in XMM0!

Demo/proof: from this source

--- llvm-bug.c ---
#ifndef __clang__
typedef struct {
    unsigned __int64 low;
    unsigned __int64 high;
} __uint128_t;
#else
__attribute__((ms_abi))
#endif
__uint128_t __udivmodti4(__uint128_t dividend, __uint128_t divisor, __uint128_t *remainder) {
    if (remainder != 0)
        *remainder = divisor;
    return dividend;
}
--- EOF ---

clang -c -O1 generates the following INCOMPATIBLE and WRONG code:

__udivmodti4 proc public
        movaps xmm0, xmmword ptr [rcx]
        test r8, r8
        jz 0f
        movaps xmm1, xmmword ptr [rdx]
        movaps xmmword ptr [r8], xmm1
0: ret
__udivmodti4 endp

clang's misunderstanding of the MS-ABI can be clearly seen here:

- RCX holds the address of the return value, NOT the address
  of the dividend;

- RDX holds the address of the dividend, NOT the address of
  the divisor;

- R8 holds the address of the divisor, NOT the address of the
  remainder;

- R9 holds the address of the remainder;

- aggregate data types are NOT returned in XMM0, but via the
  hidden first argument addressed by RCX;

- the address of the hidden first argument is returned in RAX!

JFTR: an 128-bit integer data type is not supported by MS.
      clang is also rather confused here: why is the return
      value mapped to an XMM register, but not the arguments?

Microsoft's CL.EXE -c -Ox generates the following (of course)
CONFORMANT code:

__udivmodti4 proc public
; Line 10
        test r9, r9
        je SHORT $LN1@udivmodti4
; Line 11
        mov rax, QWORD PTR [r8]
        mov QWORD PTR [r9], rax
        mov rax, QWORD PTR [r8+8]
        mov QWORD PTR [r9+8], rax
$LN1@udivmodti4:
; Line 12
        mov rax, QWORD PTR [rdx]
        mov QWORD PTR [rcx], rax
        mov rax, QWORD PTR [rdx+8]
        mov QWORD PTR [rcx+8], rax
        mov rax, rcx
; Line 13
        ret 0
__udivmodti4 endp

NOT AMUSED
Stefan

The code that you have has a large #ifndef __clang__ block in it, and IMO that explains the ABI difference.

As you note, MSVC does not have native support for 128 bit integers, so there is no reason for Clang to attempt to be ABI compatible.

The __uint128_t arguments are passed indirectly because MSVC has a rule that requires arguments larger than 64 bits to be passed indirectly by address. I believe exceptions to that rule, such as vector arguments, are made on a case-by-case basis. No such rule exists for return values, so we get the usual i128 handling for x86 instead.

__uint128_t isn’t getting the usual x86 handling. I think the usual handling for i128 return would be rdx:rax. Instead I believe it’s being coerced to v2i64 by this code in clang/lib/CodeGen/TargetInfo.cpp

// Mingw64 GCC returns i128 in XMM0. Coerce to v2i64 to handle that.
// Clang matches them for compatibility.
return ABIArgInfo::getDirect(llvm::FixedVectorType::get(
llvm::Type::getInt64Ty(getVMContext()), 2));

__uint128_t is a “builtin type” like int or short or char or float. It is not a “user-defined type”. The user did not define the type, its part of the compiler. Since MSVC does not have such a builtin type we are free to do whatever we want with it because the type can’t exist in any code compiled with MSVC so we don’t need to interoperate.

If you used the same struct with clang as you did with MSVC instead of using a compiler defined type we would be compatible. Names starting with 2 underscores are reserved for compilers and libraries so user code shouldn’t be defining a struct with that name anyway. We work fine on code that compiles with MSVC without detecting the compiler and giving different code based on the compiler.

How about I just disable __uint128_t as a keyword when compiling for Windows?

What if the source file was this instead. Did we follow the MSVC ABI now?

#include <intrin.h>

#ifndef clang
typedef __m128i __uint128_t;
#else
attribute((ms_abi))
#endif
__uint128_t foo(__uint128_t x) {
return x;
}

Mingw64 GCC returns i128 in XMM0

I believe this was changed some many years ago (or possibly never true). I forget the exact version where this might have changed in mingw-w64, but I filed this as a bug some years ago based on my observation of GCC’s behavior (https://bugs.llvm.org/show_bug.cgi?id=16168) but then I only fixed part of it (the relevant parts of llvm but not of clang).