Automatically backing up and restoring x18 around function calls on AArch64?

Hi,

When using Wine to run Windows ARM64 executables on Linux, there's one major ABI incompatibility between the two; Windows treats the x18 register as the reserved platform register, while it is free to be clobbered anywhere in code on Linux.

The Wine code sets up this register before passing control over to the Windows executable code, but whenever the Windows code calls a function implemented in Wine, this register ends up clobbered.

The initial solution is to compile Wine itself with the -ffixed-x18 flag, to treat the x18 register as reserved within Wine, so that no compiler generated code touches it. This works fine as long as Wine makes no calls to other libraries from the outside Linux distribution - but e.g. as soon as any glibc function is called, it can end up clobbered again.

The full, proper solution would of course to be rebuilding one's Linux distribution from scratch with -ffixed-x18 on every single library. But this is of course pretty much impractical.

My current makeshift workaround for the matter is to reroute calls from native Windows code to Wine functions via a relay wrapper, which backs up and restores the register. This works in practice, but hasn't been accepted upstream, as it is not deemed a correct/complete solution. Also, if the emulated Windows API functions do callbacks, the register would need to be restored before the callbacks call windows code.

Another idea which was raised in https://bugs.winehq.org/show_bug.cgi?id=38780#c13, which disucsses this matter, was if it would be possible to make Clang/LLVM always back up and restore x18 on any call to any other function, to make sure that the register maintains the right value as long as running in Wine code. (One concern is that this could end up rather expensive though.)

I didn't really find any straighforward way of starting to implement this however. Does anyone here happen to have ideas about how one could either implement this, or solve it in another way?

// Martin

Hi Martin,

Doesn’t implementing “always back up and restore x18 on any call to any other function” boil down to having a PCS (procedure call standard) variation where x18 is caller saved?
I think that in the calling convention code (e.g. see AArch64CallingConvention.td) when a register is not marked as callee saved (CalleeSavedRegs), it is implicitly caller saved.
My guess without looking further as to what results in x18 not being saved/restored around function calls is that it is marked as a Reserved Register when the -ffixed-x18 command line optoin is used. AFAIU, reserved registers are not taking into account during liveness analysis, and code to save/restore it will never be generated.
See e.g. AArch64Subtarget::AArch64Subtarget doing the following:

if (AArch64::isX18ReservedByDefault(TT))
ReserveXRegister.set(18);

Apart from how to implement this behaviour: are you sure that always saving/restoring x18 on function boundaries would be the correct thing to do?
Couldn’t some functions on the windows/WINE side change the value of X18? In that case you’d want to see the new value of X18 after a function returns, not restore the value from before the function call?

Thanks,

Kristof

Doesn't implementing "always back up and restore x18 on any call to any other function" boil down to having a PCS (procedure call standard) variation where x18 is caller saved?

Not quite, that only requires the function to preserve it on return;
Martin wants it preserved on any call into a windows function too. And
possibly any call at all if he can't identify windows functions.

Couldn't some functions on the windows/WINE side change the value of X18? In that case you'd want to see the new value of X18 after a function returns, not restore the value from before the function call?

I believe it's some equivalent of TPIDR_EL0 (i.e. a thread control
block). I'd be slightly concerned but hopeful it could be expected to
remain constant.

Cheers.

Tim.

Doesn't implementing "always back up and restore x18 on any call to any other function" boil down to having a PCS (procedure call standard) variation where x18 is caller saved?

Not quite, that only requires the function to preserve it on return;
Martin wants it preserved on any call into a windows function too. And
possibly any call at all if he can't identify windows functions.

Yes, pretty much. There's a calling convention attribute that helps me identify such functions, but it doesn't help much in practice; the correct value of x18 needs to be brought along up until that point anyway.

Couldn't some functions on the windows/WINE side change the value of X18? In that case you'd want to see the new value of X18 after a function returns, not restore the value from before the function call?

I believe it's some equivalent of TPIDR_EL0 (i.e. a thread control
block). I'd be slightly concerned but hopeful it could be expected to
remain constant.

Yes, it's exactly that - it's constant within one thread.

// Martin

Hi Martin,

Another idea which was raised in
https://bugs.winehq.org/show_bug.cgi?id=38780#c13, which disucsses this
matter, was if it would be possible to make Clang/LLVM always back up and
restore x18 on any call to any other function, to make sure that the
register maintains the right value as long as running in Wine code.

You're still pretty much screwed if you call qsort with a windows
callback (or any equivalent but more convoluted situation), aren't
you? I suppose you'd have to surround any top-level callback passed
into non-compliant code with an x18-thunk.

I didn't really find any straighforward way of starting to implement this
however. Does anyone here happen to have ideas about how one could either
implement this, or solve it in another way?

At a high level what you probably need to do is get x18 copied on
function entry and propagated into every call (or, better, every call
to a Windows function). Since it's all implicit the place to modify is
AArch64ISelLowering.cpp.

LowerFormalArguments could do a CopyFromReg to preserve the value on
function entry, and probably CopyToReg it into a new virtual register
so that other basic blocks can access it (via
MachineRegisterInfo::createVirtualRegister). You'd save the value of
this virtual register somewhere for later (in
AArch64MachineFunctionInfo probably).

LowerCall would then CopyToReg from that vreg andput it back into x18,
then mark the call as using that register. (using the RegsToPass
variable, or equivalent). Since you know which functions need x18, you
can be specific there.

LowerReturn probably also needs to "return" it in x18 too, though you
might handle that side by making it callee-saved instead.

I've not tested any of this, BTW. And upstreaming it would likely be
controversial.

Cheers.

Tim.

Hi Martin,

Another idea which was raised in
https://bugs.winehq.org/show_bug.cgi?id=38780#c13, which disucsses this
matter, was if it would be possible to make Clang/LLVM always back up and
restore x18 on any call to any other function, to make sure that the
register maintains the right value as long as running in Wine code.

You're still pretty much screwed if you call qsort with a windows
callback (or any equivalent but more convoluted situation), aren't
you? I suppose you'd have to surround any top-level callback passed
into non-compliant code with an x18-thunk.

Yes, pretty much. Although I believe Wine's doesn't use much system functions of that sort.

I didn't really find any straighforward way of starting to implement this
however. Does anyone here happen to have ideas about how one could either
implement this, or solve it in another way?

At a high level what you probably need to do is get x18 copied on
function entry and propagated into every call (or, better, every call
to a Windows function). Since it's all implicit the place to modify is
AArch64ISelLowering.cpp.

LowerFormalArguments could do a CopyFromReg to preserve the value on
function entry, and probably CopyToReg it into a new virtual register
so that other basic blocks can access it (via
MachineRegisterInfo::createVirtualRegister). You'd save the value of
this virtual register somewhere for later (in
AArch64MachineFunctionInfo probably).

LowerCall would then CopyToReg from that vreg andput it back into x18,
then mark the call as using that register. (using the RegsToPass
variable, or equivalent). Since you know which functions need x18, you
can be specific there.

LowerReturn probably also needs to "return" it in x18 too, though you
might handle that side by making it callee-saved instead.

Hmm, ok - that does indeed sound at least doable, at some level. Thanks for the insight and pointers!

I've not tested any of this, BTW. And upstreaming it would likely be
controversial.

Yes, I can imagine that.

I'm not necessarily running off to try to implement this right away; I'm trying to see what level of effort it'd require. For my own purposes my current solution of just backing it up and restoring on the windows<->wine boundary though. But that patch isn't making much headway upstream into Wine either, and it essentially means that Wine on arm64 (for running actual foreign windows binaries) only works for those who carry these out of tree patches along. (But for others, the current state of affairs is enough for building code with winelib.)

Anyway, thanks for your time!

// Martin

I am reminded of the ms_abi / sysv_abi attribute annotations. Would it be possible to leverage those to insert the appropriate spills and fills? So, anything annotated ms_abi is presumably an entry point into the SysV side of things, so any explicitly ms_abi annotated function would add X18 as a CSR.

It's not a complete solution at least; that would take care of backing up and restoring, but it wouldn't enforce consistency while in SysV land. If the Wine function first does a glibc call, then goes on to call a user provided callback function, X18 is clobbered at this point. The suggested idea was to try to maintain X18 intact as far as possible while within Wine itself as well.

That said, having the compiler automatically back up and restore X18 in the functions marked ms_abi would probably be nicer than my current hack to enforce this in Wine by the use of relay hooks, with the same results. I could actually try to give that a shot.

// Martin

Hi,

When using Wine to run Windows ARM64 executables on Linux, there’s one
major ABI incompatibility between the two; Windows treats the x18
register as the reserved platform register, while it is free to be
clobbered anywhere in code on Linux.

The Wine code sets up this register before passing control over to the
Windows executable code, but whenever the Windows code calls a function
implemented in Wine, this register ends up clobbered.

The initial solution is to compile Wine itself with the -ffixed-x18 flag,
to treat the x18 register as reserved within Wine, so that no compiler
generated code touches it. This works fine as long as Wine makes no calls
to other libraries from the outside Linux distribution - but e.g. as soon
as any glibc function is called, it can end up clobbered again.

The full, proper solution would of course to be rebuilding one’s Linux
distribution from scratch with -ffixed-x18 on every single library. But
this is of course pretty much impractical.

My current makeshift workaround for the matter is to reroute calls from
native Windows code to Wine functions via a relay wrapper, which backs up
and restores the register. This works in practice, but hasn’t been
accepted upstream, as it is not deemed a correct/complete solution. Also,
if the emulated Windows API functions do callbacks, the register would
need to be restored before the callbacks call windows code.

You don’t actually need to back up the value, since it doesn’t change, just copy it from another per-thread location back into x18, right?

Anyhow, this “workaround” seems like the correct solution, IMO. You have two slightly-incompatible ABIs, and using an adapter between the two seems entirely reasonable. As long as this adapter code is either written in asm, or compiled with -ffixed-x18, you can be sure that the x18 won’t be overwritten by the compiler inside the adapter function. Yes, it does need to be invoked both on the return path back to windows code, and on callouts to windows functions, but that doesn’t seem like it should be terribly tricky to arrange?

Having the compiler handle it for you might seem nice, but since this is a value which is not part of the Linux ABI at all, copying it yourself as needed really seems like the best plan. Otherwise, you’d have to teach the compiler where the secondary thread-local location it’s stashed is, and how to retrieve it from there. Which doesn’t seem like the compiler ought to be in the business of doing.

      Hi,

      When using Wine to run Windows ARM64 executables on Linux,
      there's one
      major ABI incompatibility between the two; Windows treats the
      x18
      register as the reserved platform register, while it is free to
      be
      clobbered anywhere in code on Linux.

      The Wine code sets up this register before passing control over
      to the
      Windows executable code, but whenever the Windows code calls a
      function
      implemented in Wine, this register ends up clobbered.

      The initial solution is to compile Wine itself with the
      -ffixed-x18 flag,
      to treat the x18 register as reserved within Wine, so that no
      compiler
      generated code touches it. This works fine as long as Wine makes
      no calls
      to other libraries from the outside Linux distribution - but
      e.g. as soon
      as any glibc function is called, it can end up clobbered again.

      The full, proper solution would of course to be rebuilding one's
      Linux
      distribution from scratch with -ffixed-x18 on every single
      library. But
      this is of course pretty much impractical.

      My current makeshift workaround for the matter is to reroute
      calls from
      native Windows code to Wine functions via a relay wrapper, which
      backs up
      and restores the register. This works in practice, but hasn't
      been
      accepted upstream, as it is not deemed a correct/complete
      solution. Also,
      if the emulated Windows API functions do callbacks, the register
      would
      need to be restored before the callbacks call windows code.

You don't actually need to back up the value, since it doesn't change, just
copy it from another per-thread location back into x18, right?

Yes, that's right. If done within Wine code, restoring the right value is of course the best option.

Anyhow, this "workaround" seems like the correct solution, IMO. You have two
slightly-incompatible ABIs, and using an adapter between the two seems
entirely reasonable. As long as this adapter code is either written in asm,
or compiled with -ffixed-x18, you can be sure that the x18 won't be
overwritten by the compiler inside the adapter function. Yes, it does need
to be invoked both on the return path back to windows code, and on callouts
to windows functions, but that doesn't seem like it should be terribly
tricky to arrange?

Handling things on entry/return to Wine code seems straightforward, but I haven't figured out if there's any mechanism for doing any special adaptation for callbacks (other than using compiler attributes for different calling conventions).

AFAIK Wine does support thunking for running Win16 things, but I haven't really figured out where it comes into play, and if that could be used for injecting code in callback calls.

Having the compiler handle it for you might seem nice, but since this is a
value which is not part of the Linux ABI at all, copying it yourself as
needed really seems like the best plan. Otherwise, you'd have to teach the
compiler where the secondary thread-local location it's stashed is, and how
to retrieve it from there. Which doesn't seem like the compiler ought to be
in the business of doing.

Yes, the compiler has no business doing things like that. But various variants of backing up and restoring the register doesn't seem all that out of place either; the thing that Reid suggested is pretty neat, even if it doesn't handle the callback case.

// Martin

Actually - theoretically - isn't the compiler free to insert calls to e.g. memset/memcpy even the original code didn't contain it? And if those functions come from outside of Wine, they could clobber it. So in that case, the only fully safe way is doing the register setup and handover to other ABI code in the same assembler piece. In practice though it's probably not an issue.

// Martin

Thanks, I managed to implement this, and it looks pretty promising, and not very invasive actually.

For enabling it, my PoC added a new target feature +protect-x18, which invokes these codepaths. (This mechanism is what the driver level flag -ffixed-x18 boils down to.) Is that sensible, or should I go for a cl::opt<bool> like the existing ones in AArch64TargetMachine.cpp?

However, at -O0, global isel is used instead (or fast-isel if that's requested). To opt out from global isel for these kinds of functions, I'd need to return false in IRTranslator::translateCall (in CodeGen/GlobalISel/IRTranslator.cpp).

Is there any hook mechanism to the target specific code, where I could check the aarch64 specific feature and opt out from global isel? (If I add an unconditional "return false" there, falling back on SelectionDAG seems to work fine and I get the same behaviour I coded there.)

If fast isel is used and I try to return false from AArch64FastISel::fastLowerCall (or if I try to implement the register restoration there), I end up with a failed assertion like this:

../lib/CodeGen/TargetRegisterInfo.cpp:192: const llvm::TargetRegisterClass* llvm::TargetRegisterInfo::getMinimalPhysRegClass(unsigned int, llvm::MVT) const: Assertion `isPhysicalRegister(reg) && "reg must be a physical register"' failed.

Is there some part of the mechanism with virtual registers in the prologue that don't quite work with fast isel?

I'll try to complete this, and a few other alternative PoCs, and present for both Wine and LLVM to decide between the different ways forward.

// Martin