Register design decision for backend

Hello everbody,

This is my first email to the list, and hope to write more as i get more involved in LLVM. I’m currently writing a backend for a 8 bit microcontroller, and i have arrived to a point where i need to take a design decision in order to continue the development.
Some background information: The microcontroller only has 8bit registers, however it has some special instructions that let you work with register pairs (adjacent registers) such as data movement, or memory operations, but i would say that 90% of the instruction set work with 8 bit regs (arithmetic, logical, compares, etc…). So what i did is to define all 8bit regs inside one regclass of size i8 and then define all reg pairs inside another regclass of size i16, marking the pairs as subregs of the 8bit regs this way:

<stripped version of my code)>

// 8 bit regs
def R0 : Register<“r0”>, DwarfRegNum<[0]>;
def R1 : Register<“r1”>, DwarfRegNum<[1]>;

// reg pairs
def R1R0 : RegisterWithSubRegs<“r0”, [R0, R1]>, DwarfRegNum<[0]>;

def GPR8 : RegisterClass<“TEST”, [i8], 8, [R0, R1]>;
def WDREGS : RegisterClass<“TEST”, [i16], 16, [R1R0]>
{
let SubRegClassList = [GPR8, GPR8];
}

This way i could work with register pairs easily, for example storing i16 data inside the WDREGS class or i32 inside 2 WDREGS registers, etc. I thought everything was going fine until i tried to do a 16 bit addition. The addition instruction only works with 8 bit regs (GPR8 class), so take this example code:

short foo(short a, short b)
{
return a+b;
}

arg a comes in register pair R1:R0 and arg b in R3:R2 of class WDREGS. I expected LLVM to be able to split the pairs into GPR8 registers from all the subreg definitions and match the pattern of the GPR8 add instruction to produce the following asm code:

add r0, r2
addc r1, r3 ; add with carry
ret

however i noticed this doesnt work. As a test, I removed the WDREGS class and passed everything in GPR8 regs, this way LLVM was able to expand the i16 add instruction into the code above. I first thought on making add16 a pseudo instr and expand it manually, but i think this is a bad solution because LLVM would loose a lot of information in such a basic thing like an addition. Also, i would have to do the same for wider data types and for the rest of arithmetic and logical instructions, so nearly everything would be handled in a customized way not allowing LLVM to optimize things. I still need to work with register pairs for the 16 bit instructions, so i cant remove the WDREGS class, also 16bit and larger data have to be aligned in even registers so the 16 bit instructions can manipulate them.

So after all this explanation my question is, how should i proceed? Is there a way to tell LLVM that the reg pairs can be splitted safely or can you think of a better solution to handle this? I’ve looked at other backends but havent seen this situation happen.
As a side note, im working with the 2.7 code base.

Thanks for your attention.

Hi, I don’t know if anyone else has responded to your question, but I am currently in development of a register allocator. Thank you for bringing up the fact that sub-register classes may be larger than their super-register. If this remains the case, I for one will write a transform for my allocator which will make the 16 bit register the super-register with the 8bit as the sub. At least for my allocator, this will simplify writing the algorithm without changing the mechanics.

I wonder, how many other targets have sub-registers larger than super-registers?

Thanks,
Jeff Kunkel

Hello Jeff, you’re the first one to reply to my question :slight_smile:
I got a bit confused with the fact you said that the subregister class is larger than the superregister class. As far as i understood or what i tried to do with my code is to define a register pair composed of two 8 bit registers the way i described in my previous message. So R1R0 in WDREGS is directly mapped into R0 and R1 of GPR8. Meaning that 2 GPR8 = 1 WDREG. Also, R1R0 is the superreg of R1 and R0?

Incase i wasnt clear enough i’ll expose a simple example using the common known x86 arch.
AX is composed by AL and AH (AX and AH are subregs of AX), now assume x86 can only add regs of 8 bits, so if a i16 number is stored in AX it should get splitted into AL and AH so that the 8 bit addition pattern is matched.

If this is what you meant from a start sorry for the noise, since that wasnt clear to me. Adding that transform to your reg allocator would be great since i cant continue writing the backend until this issue is resolved. I think this case should be handled by all register allocators implemented in LLVM, so maybe it can be factored out before using an specific allocation algorithm.
I have noticed that some changes have been done in all registerinfo.td backend files concerning subreg classes and indices since v2.7, do these changes fix this problem or they didnt have any functional changes?

Thanks.

2010/8/31 Jeff Kunkel <jdkunk3@gmail.com>

Incase i wasnt clear enough i'll expose a simple example using the common known x86 arch.
AX is composed by AL and AH (AX and AH are subregs of AX), now assume x86 can only add regs of 8 bits, so if a i16 number is stored in AX it should get splitted into AL and AH so that the 8 bit addition pattern is matched.

If this is what you meant from a start sorry for the noise, since that wasnt clear to me. Adding that transform to your reg allocator would be great since i cant continue writing the backend until this issue is resolved. I think this case should be handled by all register allocators implemented in LLVM, so maybe it can be factored out before using an specific allocation algorithm.

The LLVM target descriptions don't model sub registers well enough to do this. The target-independent code generator only knows that EAX has sub registers AX, AH, and AL. It does not know the positions of the sub registers in the super registers, or that EAX has bits that are not covered by sub registers.

The transformations you are looking for should be done on the selection DAG. Look at how the other targets are using setOperationAction() in their TargetLowering.cpp files.

I have noticed that some changes have been done in all registerinfo.td backend files concerning subreg classes and indices since v2.7, do these changes fix this problem or they didnt have any functional changes?

No, those changes were to simplify the specification of complicated register banks, like the ARM NEON registers.

/jakob

Thanks for the reply Jakob, good to know that my assumption that LLVM would split regs into smaller subregs is confirmed to be too optimistic. It would be nice if this case could be handled by LLVM, basically trying to split regs and see if patterns match with the splitted regs before giving an error.

About the transformation you mentioned in the selection DAG phase, that was my initial question, how to do it. I dont know if you meant to customize each arithmetic and logical operation with setOperationAction() or if there is way to just split regs and let LLVM handle the rest, so if you could expand a bit more on how to do it that would be great :slight_smile:
I havent seen any other backend doing this because they all have instructions that are able to work with their widest regs, but my case is different because im working with register pairs and machine instructions can only work with the pair parts.

Thanks.

First of all, note that if you don't tell TargetLowering about your i16 register class with addRegisterClass(), it synthesizes i16 and i32 operations just fine. The register splitting, you are talking about is already there.

If you do add your i16 register class, TargetLowering is going to assume that it supports normal operations like add and sub. You must tell it that they are not available for i16 using setOperationAction().

Do some experiments with trivial functions, and see what happens.

/jakob

Indeed, if you remove the i16 regs registration by removing the call to addRegisterClass() it works as expected. But that implies some other problems:

You cannot pass i16/32/64 arguments to functions or return data using the register pairs (atm i’ve only implemented LowerReturn and LowerFormalArgs for testing trivial functions).
Passing arguments in i8 regs work as expected, BUT, LLVM wont match potential 16bit instruction patterns because data is contained in 8 bit regs making code twice as big. For example, one 16 bit instruction is copying between reg pairs, so as result of this you get 2 8bit moves instead of 1 16bit move. I guess i could fold this manually, but i would have to do this for every other 16bit instructions, so i find this a bit inefficient. From these results i find that once you remove the addRegisterClass() call the i16 reg class becomes useless.

On the other side, the original problem again:
If i receive input args inside register pairs LLVM cant expand the i16 add into an add/adde combination.

I could only work with 8 bit regs and fold somehow patterns into a 16 bit instr but replicated for every 16 bit instruction, but i want to know if there’s a better solution for this since i find it too hacky and artificial.

Thanks for your patience.

(PS: sorry for the dup, but i sent the reply to your personal account instead of to the list.)

Hello Jakob,
as mentioned in my previous email i’m unable to work with register pairs and let LLVM split i16 data into two i8 regs. If you know a way of doing it without heavy customization for each operation please let me know.

As an alternative i could work only with i8 regs so that LLVM is able to split types, this the only way i’ve found so far. Doing this would imply the following changes:

  1. store i16 or wider data in odd:even reg pairs (r5:r4 and not r4:r3), achieved with a register allocation hint?

  2. convert all reg to reg moves that work on register pairs to one 16bit move. Thus folding two 8bit moves into one 16bit move. I’m a bit uncertain if this is possible to do by only working with 8 bit regs. I thought that maybe this could be done with some “Function Pass” before register allocation? One important thing to notice is that it’s not guaranteed that the 2 moves come in a row, they could come out of order or have other instructions in between, thats why i prefer doing it before reg allocation. Since reg to reg moves are handled in a different way than the rest of instructions is this manageable?

  3. load/store functions only work with register pairs. Here i think it would have to be handled in the selectionDAG?

Thanks for the help.