Sub-Register Allocation


I’m trying to get a better understanding of sub-registers. I’m seeing the code generator make an odd decision that I was hoping someone could point me in the right direction of explaining.

The architecture is 68000, which has 8, 16, and 32 bit views of all of it’s data registers. In order to zero extend you can load a big view with zero, and then copy into the small view.

I’m working on this llvm function,

define i16 @zext_i8_to_i16_simple(i8 %x) {
%1 = zext i8 %x to i16
ret i16 %1

I have a pattern where I load the 16 bit portion of the register with 0, and then copy in the 8 bit portion.

def : Pat<(i16 (zextloadi8 addr:$src)),
(INSERT_SUBREG (MOV16id 0), (MOV8md addr:$src), sub_byte)>;

which produces working but odd assembly,

zext_i8_to_i16_simple PROC ; @zext_i8_to_i16_simple
; BB#0:
move.b 4(a7), d1
move.w #0, d0
move.b d1, d0

Notice the extraneous use of d1, as

move.w #0, d0
move.b 4(a7), d0

would work just as well.

If however, I load the 32 bit portion of the register with 0, truncate it to 16bits, and then copy in the 8 bit portion, I get what I expect.

def : Pat<(i16 (zextloadi8 addr:$src)),
(MOV8md addr:$src),

zext_i8_to_i16_simple PROC ; @zext_i8_to_i16_simple
; BB#0:
moveq #0, d0
move.b 4(a7), d0

I’m building off of the llvm 3.2 release, if it matters. I’m mostly looking for where to look to understand what’s going on and/or extra documentation on how subregisters work inside the instruction and registers selectors.

Thank you,
– Kenneth Waters

P.S. If it helps, my register definitions look like,

multiclass M68kDataReg<bits<3> num, string defn, string n> {
def B : M68kReg<num, n>;
def W : M68kRegWithSubregs<num, n, [!cast(defn # “B”)]> {
let SubRegIndices = [sub_byte];
def L : M68kRegWithSubregs<num, n, [!cast(defn # “W”)]> {
let SubRegIndices = [sub_word];

defm D0 : M68kDataReg<0, “D0”, “d0”>;

defm D1 : M68kDataReg<1, “D1”, “d1”>;

LLVM’s register coalescer and allocator don’t try to reschedule instructions, which seems to be required here.


LLVM's register coalescer and allocator don't try to reschedule
instructions, which seems to be required here.

I think you're right. Looking at the instruction schedules before register
allocation, shows that it's scheduling the load before the zero move in one
case but not the other.

Is there an easy way I can trick the scheduler into putting these in the
right order? Perhaps by adding a scheduling dependency between the move
and the load?

Thank you,
-- Kenneth Waters

You can try hacking the SelectionDAG scheduler to “AddGlue” between the constant move and load. See ScheduleDAGSDNodes.cpp.

You can also try using an ISEL pseudo instr. I think it’s “usesCustomInsert=1”.

Maybe someone else has a better idea…

FYI: I’ve been hoping to add a copy removal feature to the MachineScheduler pass, which is a currently disabled pass. It could cleanup in these situations. However, in this case ISEL should really just emit things in the right order.