Force register allocator to spill all variables of a basic block

Hi,

I need to force all variables of a basic block to spill, i.e., I can’t allow basic blocks to share registers. I would like to know where is the most appropriate approach to implement that policy in LLVM.

Looking at the LLVM source, it seems that the register allocator is the best choice because it controls the spilling, but I need to guarantee that this policy is not violated by post RA passes.

To illustrate the policy, let us suppose two BBs A and B:

BB A:

a = x + y

BB B:

b = a * x

In a pseudo machine code, I need to generate this:

BB A:

load x;

load y;

add a, x, y

store a

BB B:

load a;

load x;

mul b, a, x;

store b;

Any help is much appreciated!

Thanks,
Ronaldo

Hi,

I need to force all variables of a basic block to spill, i.e., I can't allow basic blocks to share registers. I would like to know where is the most appropriate approach to implement that policy in LLVM.

Looking at the LLVM source, it seems that the register allocator is the best choice because it controls the spilling, but I need to guarantee that this policy is not violated by post RA passes.

To illustrate the policy, let us suppose two BBs A and B:

BB A:
    a = x + y

BB B:
    b = a * x

In a pseudo machine code, I need to generate this:

BB A:
    load x;
    load y;
    add a, x, y
    store a

BB B:
    load a;
    load x;
    mul b, a, x;
    store b;

Any help is much appreciated!

My first thought is to add an IR pass that generates volatile loads and stores for live-in/out values. There may be more clever ways of doing this though.
-Andy

Hi Andy and list,

Just to give you a follow up, I was implementing a transformation pass just as you recommended at the time I sent the email to the list, but at that time I was worried that the code generator would change the order of the instructions created in the LLVM-IR. Indeed, that was the case.

Some instructions after ISel were added between the volatile store instructions (which is perfectly OK). To solve this, I created a Machine Function pass that runs right after ISel and just before RA, reordering the insts added by ISel. After that, the RA correctly fixes any register reference in the generated code.

The only problem with this approach is that some SSA and Late machine optimizations break the ordering I need. Thus, I am not able to use them yet. However, I am pretty happy with the results so far.

Thank you!

ronaldo

Hi Andy and list,

Just to give you a follow up, I was implementing a transformation pass just as you recommended at the time I sent the email to the list, but at that time I was worried that the code generator would change the order of the instructions created in the LLVM-IR. Indeed, that was the case.

Some instructions after ISel were added between the volatile store instructions (which is perfectly OK). To solve this, I created a Machine Function pass that runs right after ISel and just before RA, reordering the insts added by ISel. After that, the RA correctly fixes any register reference in the generated code.

Weird. Is selection DAG scheduler changing the order of the SD nodes? (-debug-only=pre-RA-sched). You can “turn it off” with -pre-RA-sched=source. It’s off by default on trunk for x86.

The only problem with this approach is that some SSA and Late machine optimizations break the ordering I need. Thus, I am not able to use them yet. However, I am pretty happy with the results so far.

If those loads/store are really volatile, I don’t think they will be reordered with any memory ops or calls. Non-memory instructions may be inserted between them though.

You were also right in your initial analysis that the register allocator is free to put things in registers across blocks, which could happen if the machine optimization passes moved non-memory operations across blocks. I think you either have to disable any machine code pass that does code motion, or force the register allocator to spill across blocks.

-Andy

The selection DAG is fine, my problem was much simpler (and I haven’t explained it in the email, sorry). As you mentioned, the volatile load/stores are not reordered, but non-memory instructions are being inserted between them. In my pass, I need that all store instructions are placed right before the terminator instruction in the BB without any non-store instruction between them. In the LLVM-IR pass that computes live-in/out values for each BB, I am placing these instructions as I wanted. But for instance:

volatile store x;

volatile store 99;

The code gen then produces (i’m generating code for mips):

store x;

$1 = add 0, 1

store $1

I just had to move the add instruction and place it before the first store.

I did disable the passes that do code motion, everything is working. In my first email my idea was to for the RA to spill across blocks, but your solution using the volatile load/stores works fine too.

ronaldo