[RFC] Exposing ghccc calling convention as preserve_none to clang

brandtbucher · November 16, 2023, 7:00pm

So, what should the next step be if this is something we want to see in Clang 18? Open an issue?

boomanaiden154-1 · November 18, 2023, 9:26pm

Note that there are a couple additional architectures that support ghccc. These include AArch64 and RISCV. Not sure if there are any others. The LangRef still specifies that ghccc is x86/x86-64 only because (it seems like) everyone just didn’t bother to update the LangRef after doing the implementation for each architecture.

There are a couple additional architectures that could also probably easily have supported added for ghccc, namely the ones listed at https://github.com/ghc/ghc/tree/master/rts/include/stg/MachRegs that don’t already have an existing implementation within LLVM.

dongAxis · December 12, 2023, 4:00am

Hi aeubanks,

We are also trying to build a high-performance interpreter and trying to utilize this feature. But I have a question about that.

Forgot to mention that this is limited to x86-64 for now. If other arches have support for ghccc and there is interest, we can extend this.

We pass the flag ghccc to the arm64 backend to solve this.
But unluckily, we found the function in arm64 can not be optimized to tailcall, due to the following function:

static bool canGuaranteeTCO(CallingConv::ID CC, bool GuaranteeTailCalls) {
return (CC == CallingConv::Fast && GuaranteeTailCalls) ||
CC == CallingConv::Tail || CC == CallingConv::SwiftTail;
}

// Return true if we might ever do TCO for calls with this calling convention.
static bool mayTailCallThisCC(CallingConv::ID CC) {
switch (CC) {
case CallingConv::C:
case CallingConv::AArch64_SVE_VectorCall:
case CallingConv::PreserveMost:
case CallingConv::PreserveAll:
case CallingConv::Swift:
case CallingConv::SwiftTail:
case CallingConv::Tail:
case CallingConv::Fast:
return true;
default:
return false;
}
}

ghcc function in arm64 can not be optimized into tail call, just because mayTailCallThisCC dose not call canGuaranteeTCO, but x86 dose.
Is there any other side effect in aarch64?

Thanks in advance

davidxl · December 12, 2023, 5:27am

FYI, @weiguozhi has an implementation of preserve_none cc and showed very nice speed up when applied on long chain of tail calls. The patch will be sent upstream for review after more testing on the large code base.

vberlier · December 21, 2023, 9:54am

@weiguozhi Is your fork publicly available? I’m also looking for something like preserve_none cc I’d be really interested in checking it out.

weiguozhi · December 21, 2023, 9:10pm

@vberlier, you can try it from my branch GitHub - weiguozhi/llvm-project at carrot-preserve-none.

vberlier · December 22, 2023, 1:39pm

Thanks! I’ll try it out

weiguozhi · January 8, 2024, 10:58pm

I have sent out the following patch for review.

github.com/llvm/llvm-project

New calling convention preserve_none

llvm:main ← weiguozhi:carrot-preserve-none

opened 09:46PM - 03 Jan 24 UTC

weiguozhi

+419 -3

The new experimental calling convention preserve_none is the opposite side of ex…isting preserve_all. It tries to preserve as few general registers as possible. So all general registers are caller saved registers. It can also uses more general registers to pass arguments. This attribute doesn't impact floating-point registers. Floating-point registers still follow the c calling convention. Currently preserve_none is supported on X86-64 only. It changes the c calling convention in following fields: * RSP and RBP are the only preserved general registers, all other general registers are caller saved registers. * We can use [RDI, RSI, RDX, RCX, R8, R9, R11, R12, R13, R14, R15, RAX] to pass arguments. It can improve the performance of hot tailcall chain, because many callee saved registers' save/restore instructions can be removed if the tail functions are using preserve_none. In my experiment in protocol buffer, the parsing functions are improved by 3% to 10%.

weiguozhi · April 22, 2024, 8:26pm

@brandtbucher proposed a new patch to change the parameter passing register order of preserve_none. So when preserve_none function calls a normal function, less register move instructions will be generated.

I tested this patch with udp protobuf parsing microbenchmark. The result is very positive.

                                                      base           test
BM_Parse_Proto2<FileDescSV, InitBlock, Alias>    677.912MB/s       693.477MB/s        +
BM_Parse_Proto2<FileDesc, InitBlock, Copy>       727.151MB/s       752.567MB/s        +
BM_Parse_Proto2<FileDesc, UseArena, Copy>        710.576MB/s       740.293MB/s        +
BM_Parse_Proto2<FileDesc, NoArena, Copy>         391.355MB/s       390.845MB/s        =

davidxl · April 22, 2024, 10:29pm

The results look promising. Any objections making the reverse order parameter passing part of preserve_none?

aeubanks · April 22, 2024, 11:03pm

There was a comment in Try to use non-volatile registers for `preserve_none` parameters by brandtbucher · Pull Request #88333 · llvm/llvm-project · GitHub about wanting to make sure this gets in LLVM 19 before the ABI is stabilized. However, I believe fastcc does not have a stable ABI (?). Given that preserve_none is also supposed to produce fast code, I think we should explicitly say that preserve_none’s ABI is unstable and subject to change if we find improvements in the future.

davidxl · April 23, 2024, 1:43am

It is a good idea to give some time for the feature to evolve and get mature and mature. We can probably treat preserve_none without ABI stability guarantee for the time being, and revisit this later when it becomes stable.

Topic		Replies	Views
"Preserve none" calling convention Using Clang x86	1	546	June 21, 2023
Improving support for the "Cold" Calling Convention LLVM Dev List Archives	1	94	January 14, 2013
IPRA, interprocedural register allocation, question LLVM Dev List Archives	31	104	July 15, 2016
Request for merge: GHC/ARM calling convention. LLVM Dev List Archives	9	96	August 1, 2012
preserve registers across function call LLVM Dev List Archives	4	141	August 27, 2015

[RFC] Exposing ghccc calling convention as preserve_none to clang

Related Topics