register allocation

Hi,

My target has special requirements during register allocation - there is both a need to handle register pairing and to never spill a flag result reg-class (which might happen at -O0 for no obvious reason).

Since neither of these issues seems to be supported, I have tried to pre-allocate these registers in the preRA pass. This has resulted in “using undefined physical register” verification failure with the fast regalloc, as this regalloc does not scan the code properly for used pregs, and thus marks a use-physreg-op as ‘Kill’ when it is in fact used in the following instruction (where it was marked with ‘Kill’).

It seems that the other register allocators are fine with pre-allocation, as they scan for preg live ranges, right?

I wonder, is there a way to know wich regalloc is loaded by the passmanager in the preRA-pass?

(In GCC, there is a way to specify register pairing on the primitive level, by adding predicates for operand-combinations, eg “base1, base2, base3” for addr-reg, and “offs1-2, offs3-4, offs5-6” for the index register. Is there a neat way of doing this with Tablegen? For each instruction with two paired register operands, these combinations must be the only legal ones, but do I have to write several instruction definitions to express this?)

To do away with pre-allocation, LLVM would have to be extended with an RegClass/register-attribute ‘spillable’, and a register allocator would have to implement register pairing. That would be PBQP, right? What’s your standpoint on these extensions? I would be glad to work with you on this, when I have the time.

Jonas Paulsson
Ericsson

LLVM would have to be extended with an RegClass/register-attribute ‘spillable’

What exactly are you proposing? Why can’t you do what the PowerPC and Hexagon targets do?

Spill-free register allocation sounds great, why not do it for all register classes?

, and a register allocator would have to implement register pairing. That would be PBQP, right?

If by pairing you mean GCC’s multiple-alternatives, then PBQP should be able to support that.

/jakob

LLVM would have to be extended with an RegClass/register-attribute ‘spillable’

What exactly are you proposing? Why can’t you do what the PowerPC and Hexagon targets do?

Yes, I can move a CR to a GPR and save it to the stack, but due to a very irregular register file this is about 10 times more expensive than saving/restoring an ordinary register. These registers should basically never
have to be spilled, at least not on -O3.

Spill-free register allocation sounds great, why not do it for all register classes?

Well, in this architecture, each GPR has its own CR, and I have made a super-register that includes the GPR and CR when the flag is needed. I need to write code like:

GPR_CR = cmp GPR, 0 // GPR_CR def
<other code, possibly using just GPR’s without setting the CR>
br #, GPR_CR // GPR_CR use

Basically, I would like to make sure that the GPR_CR reg is not spilled before another GPR-only reg is spilled, as it would be idiotic. As the super-register is wider than the GPR I do not see why this happened, but at -O0
this happened for some reason or other.

I tried a method with simply handing out pregs as suitable in preRA. I would like to confirm with you if possible that this should work with PBQP and greedy/basic, which scans the LiveIntervals for pregs and remembers it.
What’s more, setting the GPR_CR class to ‘not-spillable’ would probably do the trick here as we basically do not want to do this, and I would not have to pre-allocate. But there is probably a better way, or?

, and a register allocator would have to implement register pairing. That would be PBQP, right?

If by pairing you mean GCC’s multiple-alternatives, then PBQP should be able to support that.

/jakob

I take it then that it is not possible to write operand-combinations as in GCC in LLVM so as to handle register pairing on a high level?

The PBQP algorithm as such supports register pairing per the article, but it was not implemented in RegAllocPBQP.cpp as far as I can see.

For my needs, I would like to say that whenever two registers are used in the same instruction, they must follow any register-pairing rule defined for any such occurence. Possibly, one could add a Constraint per instruction def
as well to indicate the use of the register pairing rule, and to allow instances where it does not apply.

PBQP extension (suggestion)

What exactly are you proposing? Why can’t you do what the PowerPC and Hexagon targets do?

Yes, I can move a CR to a GPR and save it to the stack, but due to a very irregular register file this is about 10 times more expensive than saving/restoring an ordinary register. These registers should basically never
have to be spilled, at least not on -O3.

And the register allocator is trying very hard not to do that, but sometimes it just isn’t possible.

You have the choice between aborting compilation or inserting expensive spill code when that happens.

Basically, I would like to make sure that the GPR_CR reg is not spilled before another GPR-only reg is spilled, as it would be idiotic. As the super-register is wider than the GPR I do not see why this happened, but at -O0
this happened for some reason or other.

I sounds like you disabled optimizations, and now optimizations are missing. The fast register allocator tries very hard to be fast. That includes producing suboptimal code. Sometimes it is faster to spill than to run an expensive interference computation.

What’s more, setting the GPR_CR class to ‘not-spillable’ would probably do the trick here as we basically do not want to do this, and I would not have to pre-allocate. But there is probably a better way, or?

I am sorry, I simply don’t understand what you are asking for. You might as well suggest that we all don’t pay taxes. It sounds tempting, but the plan needs some details.

If the fast register allocator is not working for you, you can use the greedy register allocator in -O0 builds by passing -regalloc=greedy. It’s not that much slower.

I take it then that it is not possible to write operand-combinations as in GCC in LLVM so as to handle register pairing on a high level?

That’s right.

The PBQP algorithm as such supports register pairing per the article, but it was not implemented in RegAllocPBQP.cpp as far as I can see.

The PBQP allocator allows targets to specify complicated constraints that the normal constraint model doesn’t support. Constraints are modeled as matrices, so any specific type of constraint wouldn’t be mentioned in the source file.

PBQP extension (suggestion)

Tablegen:

def regPair : registerPair<AddrReg0, OffsReg0>,
~or~
def regPair: registerPairing<AddrReg0, [OffsReg0, OffsReg1, OffsReg2]>;
~or~
??

I am not sure how much easier that would be than the current approach. I’ll leave that up to Lang.

Hi Jonas,

As Jakob mentioned, PBQP is designed to provide support constraints that aren’t supported in the current model, without requiring any modifications to the PBQP allocator itself. Constraints are expressed to the PBQP allocator via matrices, as described in http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-September/034781.html .

If you decide to try the PBQP route and need pointers to further material, or help with the API, please just let me know.

  • Lang.

Hi Jakob, Jonas et al,

Jakob wrote:
[...]

Jonas wrote:
[...]
> What's more, setting the GPR_CR class to 'not-spillable' would probably do the trick here as we
> basically do not want to do this, and I would not have to pre-allocate. But there is probably a
> better way, or?

I am sorry, I simply don't understand what you are asking for. You might as well suggest that we all
don't pay taxes. It sounds tempting, but the plan needs some details.

If the fast register allocator is not working for you, you can use the greedy register allocator in
-O0 builds by passing -regalloc=greedy. It's not that much slower.

[...]

I believe that Jonas wants to have the following:
- A register class specification such that you get the same effect as 'glue code', but without
  having to introducing custom nodes and lowering to represent this glue.

I think that 'glue' is today the only solution if you do not want to produce spill code.

In a backend on which I have been working, we have a compare instruction that sets a flag (true or false), and a conditional branch instruction, that jumps if the flag is true.

Initially, I tried to specify this by providing a lowering of the compare instruction :

(CCReg only contains one register)

def CMPEQ : Instr<
  (outs CCReg:$dst),
  (ins IntRegs:$lhs, IntRegs:$rhs),
  "cmpeq $lhs, $src",
  [(cmpeq IntRegs:$dst, IntRegs:$src)>;

def BCC : Instr<
  (outs),
  (ins CCReg:$cc, brtarget:$addr),
  "bcc $addr",
  [(brcond CCReg:$cc, bb:$addr)]

;

This worked, but for more complex programs, llvm tries to generate code to move CC and to spill CC.
I was able to get rid of the 'moves' by setting the copycost to -1.
For getting rid of the spills, I was forced to introduce custom nodes, that pass the CC register
through 'glue'.

Having a property to register classes that identifies a 'glue-like' behavior would make sense to me.

Greetings,

Jeroen Dobbelaere

Hi,
I am a LLVM newbie, I have read some tutorials of LLVM, and play with some demos.
I am developing a Hadoop native runtime, it has C++ APIs and libraries, what I want to do is to compile Hive’s logical query plan directly to LLVM IR or translate Hive’s physical query plan to LLVM IR, then run on the Hadoop native runtime. As far as I know, Google’s tenzing does similar things, and a few research papers mention this technique, but they don’t give details.
Does translate physical query plan directly to LLVM IR reasonable, or better using some part of clang library?
I need some advice to go on, like where can I find similar projects or examples, or which part of code to start to read?

Thanks,

Binglin Chang

I think setting CopyCost = -1 more or less does what you want, but I don't really understand what that field is for.

It is only ever used by the scheduler. Register allocation completely ignores it. Judging by how it is used, it should probably be a property of registers, not register classes.

/jakob

Well, I actually glue my condition-registers already, but as far as I know this only affects the scheduling from the DAG to the MachineFunction. It still happened that a register allocator at -O0 would spill this register for no reason at all in between these instructions.

The CopyCost attribute would help me, if the register allocators would factor it in with the spillWeight calculation. I suppose this would be fairly simple to implement, given that all the register allocators except -fast use the CalcSpillWeights class, where it could be included. Would this make sense to you as a patch? Maybe CopyCost and SpillCost should be held separate, I don't know, but in this case a new attribute could be introduced.

Jonas

Jonas, I doubt this would help you. Spill weights are only ever compared to other spill weights from the same register class. If you raise the spill weight of all your predicate registers by the same factor, it's not going to change anything.

Spill weights are used to determine what to spill. Not whether to spill or not.

/jakob