Need for Allocatable Global Register Variables

Dear clang/llvm developers,

Thank you for providing this wonderful toolchain !

I’m the maintainer of gprolog (GNU Prolog) which includes a command-line compiler compiling a Prolog source to native code. The compilation is multi-pass:

  • Prolog is first compiled to its abstract machine (called the WAM)
  • the WAM code is then compiled to a Mini-Assembly (MA) language. MA has been designed for Prolog as a machine-independent assembly language. It offers instructions corresponding to WAM control instructions (to handle the non-deterministic control of Prolog) and instructions to call C runtime functions (in order to reduce de size of the resulting code, most non-control WAM instructions give rise to a call to a C function which actually performs the task).
  • The MA code is then mapped to the assembly code of the target machine (so there is a mapper Ma->asm for each architecture).

gprolog is written in C (runtime functions implementing WAM instructions) and Prolog (bootstrapped compiler and Prolog libraries). I already use clang for porting gprolog on x86_64/darwin (intel/MacOS). We recently ported gprolog to arm64 (aarch64)/linux using gcc. I’m working on the extension of this port for arm64/darwin (Apple arm M1/MacOS), thus using clang for C code.

On most architectures, we can map some WAM registers to physical registers resulting in significant performance increase. Depending on the ABI the set of usable registers is more or less wide (we need callee-saved registers in order to call any function safely, e.g. calls to libc function). This is achieved in the C runtime code with global register variables (GRV). The corresponding registers are reserved with the gcc option -ffixed-REG.

Unfortunately, this does not work well with clang/llvm. Looking for information I saw the clang documentation mentioning that only non-allocatable registers can be used (e.g. sp). I also saw “[RFC] Allocatable Global Register Variables for ARM” which concerns exactly the same problem on ARM (maybe we could find a more general solution, not limited to ARM).

The current clang limitation on GRV is very restrictive, specially when “mapping” an abstract machine as needed for gprolog. It is a pity, in particular on the arm64 which provides many usable registers. For instance under arm64/linux with gcc I can map x19,…,x28. I tried to do the same with clang. There is no problem to define such a GRV (e.g. register long my_reg __asm(“x20”)) nor to compile it with clanging -ffixed-x20. The compiler does not complain and the generated code is almost perfect : C code using my_reg correctly gives rise to arm64 code using x20. Unfortunately, the register, being a callee-saved, is spilled to the stack and restored at the end of the function (I checked this generating the assembly file with -S). Thus a C runtime function modifying an abstract machine register will have no effect in practice. However, without this save/restore of x20 onto the stack the code would be perfect (this is also reported in the above RFC).

Would it be possible to improve things (after all If gcc handles GRV correctly, this should be doable in clang too). Isn’t it possible to simply consider that a register REG reserved with -ffixed-REF should not be considered as callee-saved when generating the code (and thus avoid this register from being spilled to the stack ) ? If needed we can imagine an additional clang option to inform the back-end to not consider a given register as callee-saved (so it is the responsibility of the user to force this behavior with all limitations this could imply).

Extending clang GRV would open llvm to yet more applications. In the case of gprolog, we would like to go further and to replace the MA stage by a 100% llvm back-end : the gprolog compiler would generate llvm code (via the WAM) instead of MA code (keeping the runtime library written in C). We could them remove all MA->asm mappers and rely on llvm to produce the native code.