So, what should the next step be if this is something we want to see in Clang 18? Open an issue?
Note that there are a couple additional architectures that support ghccc
. These include AArch64 and RISCV. Not sure if there are any others. The LangRef still specifies that ghccc
is x86/x86-64 only because (it seems like) everyone just didn’t bother to update the LangRef after doing the implementation for each architecture.
There are a couple additional architectures that could also probably easily have supported added for ghccc
, namely the ones listed at https://github.com/ghc/ghc/tree/master/rts/include/stg/MachRegs that don’t already have an existing implementation within LLVM.
Hi aeubanks,
We are also trying to build a high-performance interpreter and trying to utilize this feature. But I have a question about that.
Forgot to mention that this is limited to x86-64 for now. If other arches have support for
ghccc
and there is interest, we can extend this.
We pass the flag ghccc to the arm64 backend to solve this.
But unluckily, we found the function in arm64 can not be optimized to tailcall, due to the following function:
static bool canGuaranteeTCO(CallingConv::ID CC, bool GuaranteeTailCalls) {
return (CC == CallingConv::Fast && GuaranteeTailCalls) ||
CC == CallingConv::Tail || CC == CallingConv::SwiftTail;
}// Return true if we might ever do TCO for calls with this calling convention.
static bool mayTailCallThisCC(CallingConv::ID CC) {
switch (CC) {
case CallingConv::C:
case CallingConv::AArch64_SVE_VectorCall:
case CallingConv::PreserveMost:
case CallingConv::PreserveAll:
case CallingConv::Swift:
case CallingConv::SwiftTail:
case CallingConv::Tail:
case CallingConv::Fast:
return true;
default:
return false;
}
}
ghcc function in arm64 can not be optimized into tail call, just because mayTailCallThisCC
dose not call canGuaranteeTCO
, but x86 dose.
Is there any other side effect in aarch64?
Thanks in advance
FYI, @weiguozhi has an implementation of preserve_none cc and showed very nice speed up when applied on long chain of tail calls. The patch will be sent upstream for review after more testing on the large code base.
@weiguozhi Is your fork publicly available? I’m also looking for something like preserve_none cc I’d be really interested in checking it out.
@vberlier, you can try it from my branch GitHub - weiguozhi/llvm-project at carrot-preserve-none.
Thanks! I’ll try it out
I have sent out the following patch for review.
@brandtbucher proposed a new patch to change the parameter passing register order of preserve_none. So when preserve_none function calls a normal function, less register move instructions will be generated.
I tested this patch with udp protobuf parsing microbenchmark. The result is very positive.
base test
BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 677.912MB/s 693.477MB/s +
BM_Parse_Proto2<FileDesc, InitBlock, Copy> 727.151MB/s 752.567MB/s +
BM_Parse_Proto2<FileDesc, UseArena, Copy> 710.576MB/s 740.293MB/s +
BM_Parse_Proto2<FileDesc, NoArena, Copy> 391.355MB/s 390.845MB/s =
The results look promising. Any objections making the reverse order parameter passing part of preserve_none?
There was a comment in Try to use non-volatile registers for `preserve_none` parameters by brandtbucher · Pull Request #88333 · llvm/llvm-project · GitHub about wanting to make sure this gets in LLVM 19 before the ABI is stabilized. However, I believe fastcc does not have a stable ABI (?). Given that preserve_none is also supposed to produce fast code, I think we should explicitly say that preserve_none’s ABI is unstable and subject to change if we find improvements in the future.
It is a good idea to give some time for the feature to evolve and get mature and mature. We can probably treat preserve_none without ABI stability guarantee for the time being, and revisit this later when it becomes stable.