Hi all,
I'm working on using LLVM as a back-end for an existing compiler (GHC
Haskell compiler)
Very cool!
and one of the problems I'm having is pinning a
global variable to a actual machine register. I've seen mixed
terminology for this feature/idea, so what I mean by this is that I
want to be able to put a global variable into a specified hardware
register.
Lets separate two things here: 1) GCC's implementation of this feature 2) the semantic/perf effect of doing it.
For 1) GCC implements this feature (with the example code you gave) by globally changing the allocatable register set for the backend and pinning the value to the specified physical register. This is really easy for GCC to do (yay, global variables for everyone, even the backend) and has the "right effect". However, this implementation is inappropriate in LLVM: if we wanted to take this approach, we'd have to encode the set of pinned physregs in the top-level module structure somewhere: this is not impossible, but it is kinda ugly.
#2 is the more interesting part of this. Ignoring GCC's implementation of this, the semantic effect of this is that the calling convention of the functions in the translation unit are changed (so that the global is guaranteed to be in the specific physreg on entrance/exit of the function) and the global is guaranteed to be in the register in inline asms. Interestingly (to me at least :), there is no guarantee that this value be in the physreg at a random point in the function. There is no "defined" way to notice this, so the compiler can cheat and reuse the register if it wants to (e.g. spilling the temp value to the stack etc). While you could notice this with a debugger, performance tool, etc, normal code should be fine.
This declaration should thus reserve that machine register
for exclusive use by this global variable. This is used in GHC since
it defines an abstract machine as part of its execution model, with
this abstract machine consisting of several virtual registers. Due to
the frequency the virtual registers are accessed it is best for
performance that they be permanently assigned to a physical machine
register.
Right. Coming back to "why do this", you want it because it is good for performance: these values are accessed frequently enough that going to globals (particularly for PIC code) is too expensive.
A very simple example C program using this feature:
--------------------------
#include <stdio.h>
register int R1 __asm__ ("esi");
int main(void)
{
R1 = 3;
printf("register: %d\n", R1);
R1 *= 2;
printf("register: %d\n", R1);
return 0;
}
--------------------------
llvm-gcc doesn't compile this program correctly, although according to
the llvm-gcc release notes this extension was first supported by llvm-
gcc in 1.9.
This program actually works for me if you build with -O, but it looks like it is an accident that it works :). The implementation in llvm-gcc could definitely be fixed in this case. However, the more interesting example wouldn't work: if printf were some other function and you read ESI in it.
If it were important to me to implement this, I'd implement this extension by adding a new custom calling convention to the X86 backend that always passed the first i32 value in ESI and always returned the first i32 value in ESI. Given that, you could lower the above code to something like this pseudo code:
{i32,i32} @main(i32 %in_esi) {
%esi = alloca i32
store in_esi -> esi
store 3 -> esi
esi1 = load esi
{esi2, dead} = call @printf(esi1, "register: %d\n", esi1);
store esi2 -> esi
esi3 = load esi
esi4 = esi3*2
store esi4 -> esi
esi5 = load esi
{esi6, dead} = call @printf(esi5, "register: %d\n", esi5);
store esi6 -> esi
esi7 = load esi
ret {esi7, 0}
}
Each of printf and main would be marked with the custom CC. After running mem2reg on this, you'd get:
{i32,i32} @main(i32 %in_esi) {
{esi2, dead} = call @printf(3, "register: %d\n", 3);
esi4 = esi2*2
{esi6, dead} = call @printf(esi4, "register: %d\n", esi4);
ret {esi6, 0}
}
When lowered at codegen time, the regalloc would trivially eliminate the copies into/out-of ESI and you'd get the code you desired.
No, I don't know of anyone planning to implement this, but it is conceptually quite simple 
-Chris