Hi,
For the record, I've spent a nontrivial amount of time on the ARM64 version of Wine, and back in the day started out by implementing the ms_abi attribute for aarch64 just to get the handling of printf like functions right - dealing with (to some extent) most of the same issues you're dealing with here.
(Also, as a side comment; the existing names "win64cc", CC_Win64 or "IsWin64" used in a number of places, are a bit misnamed in the current scope. For the original, x86-only context (with 32 and 64 bit code generation is mostly shared), where the C calling convention is similar on x86_32, but differences only arose on x86_64, naming it "Win64" probably is quite neat, but within AArch64 it's a bit redundant - and if a similar distinction would be needed on ARM (e.g. if an explicit windows calling convention would be needed), reusing the existing "win64cc" is even more out of place...)
In one of other attempts to make all this mess easier to handle, we adapted the https://github.com/shinh/maloader project (that will be open source if all of this works) to load ARM64 MachO under Linux and run the final binary using qemu-user. This can be seen as a very light version of wine [1] for iOS.
[3] What I say here isn't entirely true, as darlinghq moved away from this "wine" model (which can be seen very basically as make a loader for the targeted architecture, create wrappers for system libraries and run all of this in userland). For those interested in more information, I recommend reading the article in http://blog.darlinghq.org/2017/02/the-mach-o-transition-darling-in-past-5.html
I would say this isn't entirely accurate regarding how wine works - maybe it was the case for other thinner win32 binary loaders that have existed though.
Wine never (at least not in the last 20 years afaik) just translated calls between the windows and host environment. Wine consists of a mostly full reimplementation of all the supported Windows APIs, and these only occasionally call down to the host libc and host's native APIs. It's true that Wine used to build its modules as native ELF (or MachO) binaries - but they weren't just plain ELF .so's; internally they contain most of the PE DLL data structures as well, so that run and interact with other modules using the normal DLL import/export mechanisms.
But lately this has been taken even further, and now most modules can be built as real DLLs as well - linking against wine's msvcrt/ucrt instead of the host libc, etc. For higher level components that only interact with other DLLs, this is mostly straightforward, but for lower level components that actually do need to call the native host environment, they have been split into a native ELF/MachO component (which links against whatever system libraries it needs to use), and the bulk of the code as either a real DLL or as a DLL wrapped in ELF/MachO. This requires having a suitable cross compiler available (but with clang being multi-targeting, that should be trivially available).
So that sounds very much like the same approach that Darling is taking, except that Darling doesn't maintain support for building the emulated components as ELF, only as native MachO. And Darling has the benefit of being able to build Apple's open sourced code, instead of having to reimplement it all based on the public interfaces.
In any case - even if the bulk of the code is built as the emulated platform's native binaries (DLL or MachO), I guess there's a need for interaction at some layer (even if the interface might be quite thin), so having support for something like this sounds sensible to me.
And being able to interact with code built for a different ABI on a per-function level also sounds very sensible to me. So I don't think this is a bad idea.
BTW, for running Windows code on Linux, one constant stumbling block has been the use of the x18 register. On Linux, this register is normally free to use by any function, but on Windows, it is supposed to remain constant (pointing at a thread specific data structure), with various workarounds being used to retain it.
For the Darwin case, x18 is reserved (so compiler generated code doesn't use it, similar to windows), but AFAIK nothing really uses it. Earlier, the Darwin kernel used to overwrite the x18 register to 1 on context switch, just to make sure that no code kept relying on it retaining its value, but this doesn't seem to be the case any longer. As no code actually uses it, it shouldn't be any problem for your usecase.
The current implementation & questions
The current implementation introduces the CC_AArch64_Apple calling convention, to enforce the usage of Apple's CC when necessary. This has mainly been inspired by how CC_Win64 works.
There are I think at least these limitations:
* this supposes that the original targeted CC is Apple ARM64 AAPCS. In its current form,
there is no way to support for instance vector calls (see for instance
https://github.com/aguinet/llvm-project/commit/c4905ded3afb3182435df30e527955031cb0d098#diff-f124368bac3e5d7be20450aa83b166daR218)
I'm not familiar with the vector calling convention here - but if that's used, the function (on the C level) already has a suitable attribute specifying the non-standard calling convention? Wouldn't that end up lowered into the right thing here as well?
Or is it a case where there's a generic "vector" calling convention which turns into different things depending on whether targetin linux or darwin? In that case, you'd probably need add a separate attribute and calling conventions, like apple_vector and sysv_vector (or whatever to call the default), to allow specifying the intent more exactly.
For windows on i386, there's actually at least 4 different calling conventions being used; cdecl (the default for C code), stdcall, fastcall and vectorcall. As those names aren't associated with anything else on other platforms, you can use e.g. __attribute__((fastcall)) on any platform.
My questions would be:
* the fact that we can't target Apple's vector calls ABI shows that having one
CC_AArch64Apple (as CC_Win64 exists) calling convention might not be the right
implementation of this "apple_abi" attribute. Has someone better suggestions?
It doesn't sound too bad to me, but as naming things is one of the hardest things, one could also think of other, less generic names (as the attribute "apple_abi" or whatever it is, doesn't per se imply any specific ABI, but just is the apple default C calling convention) - but "apple_c_default" also is ugly.
* For variadic functions (which are among the functions that have different ABIs), GCC and Clang have __builtin_ms_va_list. My understanding is that we should have the Apple equivalent, but I'm not sure to completely understand what's at stake here. Said differently, is this builtin used to make sure we use the va_list type of the Apple ABI, should the need arise to forward it to another function that uses the Apple ABI?
Exactly. In your example, you're implementing printf, so you're receiving variadic arguments on the stack, boiling them down to a (linux native) va_list and passing them to a linux native vprintf. If you'd be implementing and wrapping the darwin vprintf on the other hand, you'd need to declare it to be receiving a __builtin_apple_va_list.
Example with printf
For now, we manage to compile this simple example for iOS/arm64:
#include <stdio.h>
int main(int argc, char** argv)
{
printf("number of args: %d, argv: %s, %s, %s\n", argc, argv[0], argv[1], argv[2]);
return 0;
}
and run it under the combo maloader/qemu-user under Linux/x64, using this wrapper for printf:
__attribute__((apple_abi)) int darwin_aarch64_printf(const char* format, ...)
{
va_list args;
va_start(args, format);
const int ret = vprintf(format, args);
va_end(args);
return ret;
}
The fact that va_start/va_end works by using the Linux ABI from a function whose arguments use the Apple ABI seems completely magical to me, so if someone knows why this work I would also be interested!
I think this might be a borderline case that I wasn't entirely sure would work right, but apparently does. (Or maybe the code really is flexible enough to systematically handle such mixed cases?)
The calling convention attribute indicates how and where the variadic arguments are laid out on the stack, but these are then collected into a linux native va_list, which is passed to the linux native vprintf function that interprets them accordingly.
FWIW, if you want to experiment with how variadic functions and va_list behaves on different platforms, you can try e.g. this test snippet:
void vararg(int a, ...);
void call_vararg(void) {
vararg(7, 8, 9, 10.0, 11, 12.0, 13);
}
void other(__builtin_va_list ap);
void receive_vararg(int a, ...) {
__builtin_va_list ap;
__builtin_va_start(ap, a);
other(ap);
__builtin_va_end(ap);
}
int use_vararg(__builtin_va_list *ap) {
return __builtin_va_arg(*ap, int);
}
Compiling this with e.g. "clang -target {aarch64-windows,aarch64-linux-gnu,arm64-apple-darwin} -S -O2 -o - test.c" lets you have a look at what they end up like. E.g. use_vararg is identical between darwin and windows, while call_vararg is kind of similar between linux and windows (except windows passes all variadic args in GPRs), and receive_vararg is pretty different between all of them.
Is this a terrible idea?
Building these "ABI wrappers" using an "apple_abi" attribute seemed a good idea at the beginning, but this already raises some concerns (see above), and I'd be willing to hear any arguments that show that this is actually a bad idea.
It's certainly more sustainable and durable to provide full, proper implementations of the target, like Darling and Wine do, but even then, being able to build a function taking arguments with a foreign calling convention does sound sensible and useful to me.
Depending on exactly where you draw the line between "emulated"/foreign executables and native host system, you might not have any variadic functions in the border interface layer, and then you might get away without such support in the compiler, but to me, it sounds like a useful thing to have in any case.
// Martin