Pinning registers in LLVM

Hi all,

I'm working on using LLVM as a back-end for an existing compiler (GHC Haskell compiler) and one of the problems I'm having is pinning a global variable to a actual machine register. I've seen mixed terminology for this feature/idea, so what I mean by this is that I want to be able to put a global variable into a specified hardware register. This declaration should thus reserve that machine register for exclusive use by this global variable. This is used in GHC since it defines an abstract machine as part of its execution model, with this abstract machine consisting of several virtual registers. Due to the frequency the virtual registers are accessed it is best for performance that they be permanently assigned to a physical machine register.

GCC supports an extension to enable this feature, which is described here:

A very simple example C program using this feature:

Despite the fact that llvm-gcc accepts the given code, the extension
isn't properly supported; the current output is close to correct, but
not quite. The issue is that the register allocator isn't aware of
the global register, and can allocate other values into it. Patches
welcome, I guess, although off the top of my head, I'm not sure what
the best way to go about implementing it would be.

-Eli

Thanks for the response.

The issue is that the register allocator isn't aware of
the global register, and can allocate other values into it.

So in that case there is no way currently to pin to registers then? I
won't be working with llvm-gcc, using llvm directly, was just trying
to figure out how llvm supported this feature by studying the llvm
byte code it produces.

Patches welcome, I guess, although off the top of my head, I'm not sure what
the best way to go about implementing it would be.

Any idea of the difficulty this would involve (ruff guess of time it
would take?). Need to decide if it would be better to add support to
llvm for this feature or change GHC such that it isn't needed.

Cheers,
David

Hi all,

I'm working on using LLVM as a back-end for an existing compiler (GHC
Haskell compiler)

Very cool!

and one of the problems I'm having is pinning a
global variable to a actual machine register. I've seen mixed
terminology for this feature/idea, so what I mean by this is that I
want to be able to put a global variable into a specified hardware
register.

Lets separate two things here: 1) GCC's implementation of this feature 2) the semantic/perf effect of doing it.

For 1) GCC implements this feature (with the example code you gave) by globally changing the allocatable register set for the backend and pinning the value to the specified physical register. This is really easy for GCC to do (yay, global variables for everyone, even the backend) and has the "right effect". However, this implementation is inappropriate in LLVM: if we wanted to take this approach, we'd have to encode the set of pinned physregs in the top-level module structure somewhere: this is not impossible, but it is kinda ugly.

#2 is the more interesting part of this. Ignoring GCC's implementation of this, the semantic effect of this is that the calling convention of the functions in the translation unit are changed (so that the global is guaranteed to be in the specific physreg on entrance/exit of the function) and the global is guaranteed to be in the register in inline asms. Interestingly (to me at least :), there is no guarantee that this value be in the physreg at a random point in the function. There is no "defined" way to notice this, so the compiler can cheat and reuse the register if it wants to (e.g. spilling the temp value to the stack etc). While you could notice this with a debugger, performance tool, etc, normal code should be fine.

This declaration should thus reserve that machine register
for exclusive use by this global variable. This is used in GHC since
it defines an abstract machine as part of its execution model, with
this abstract machine consisting of several virtual registers. Due to
the frequency the virtual registers are accessed it is best for
performance that they be permanently assigned to a physical machine
register.

Right. Coming back to "why do this", you want it because it is good for performance: these values are accessed frequently enough that going to globals (particularly for PIC code) is too expensive.

A very simple example C program using this feature:

--------------------------
#include <stdio.h>

register int R1 __asm__ ("esi");

int main(void)
{
  R1 = 3;
  printf("register: %d\n", R1);
  R1 *= 2;
  printf("register: %d\n", R1);
  return 0;
}
--------------------------

llvm-gcc doesn't compile this program correctly, although according to
the llvm-gcc release notes this extension was first supported by llvm-
gcc in 1.9.

This program actually works for me if you build with -O, but it looks like it is an accident that it works :). The implementation in llvm-gcc could definitely be fixed in this case. However, the more interesting example wouldn't work: if printf were some other function and you read ESI in it.

If it were important to me to implement this, I'd implement this extension by adding a new custom calling convention to the X86 backend that always passed the first i32 value in ESI and always returned the first i32 value in ESI. Given that, you could lower the above code to something like this pseudo code:

{i32,i32} @main(i32 %in_esi) {
   %esi = alloca i32
   store in_esi -> esi

   store 3 -> esi

   esi1 = load esi
   {esi2, dead} = call @printf(esi1, "register: %d\n", esi1);
   store esi2 -> esi

   esi3 = load esi
   esi4 = esi3*2
   store esi4 -> esi

   esi5 = load esi
   {esi6, dead} = call @printf(esi5, "register: %d\n", esi5);
   store esi6 -> esi

   esi7 = load esi
   ret {esi7, 0}
}

Each of printf and main would be marked with the custom CC. After running mem2reg on this, you'd get:

{i32,i32} @main(i32 %in_esi) {
   {esi2, dead} = call @printf(3, "register: %d\n", 3);
   esi4 = esi2*2
   {esi6, dead} = call @printf(esi4, "register: %d\n", esi4);
   ret {esi6, 0}
}

When lowered at codegen time, the regalloc would trivially eliminate the copies into/out-of ESI and you'd get the code you desired.

No, I don't know of anyone planning to implement this, but it is conceptually quite simple :slight_smile:

-Chris

Interestingly (to me at least :), there is no guarantee that this value
be in the physreg at a random point in the function.

Yep, also interesting to me though :).

If it were important to me to implement this, I'd implement this
extension by adding a new custom calling convention to the X86 backend
that always passed the first i32 value in ESI and always returned the
first i32 value in ESI.

Yeah that was my line of thinking as well. Thanks for the detailed response.

~ David