HW loads wider than int

I am trying to prototype a back end for a new processor. It has a 64-bit datapath, so all registers are 64 bits and load instructions always extend to 64 bits. But the type 'int' is 32 bits, and arithmetic instructions have variants that operate on only the lower 32 bits of each register.

So for a basic 'a = b + c' example, we get
  %0 = load i32, i32* @b, align 4, !tbaa !1
  %1 = load i32, i32* @c, align 4, !tbaa !1
  %add = add nsw i32 %1, %0
  store i32 %add, i32* @a, align 4, !tbaa !1

And we'd want to generate
ldw %r0,@b ; load b (32 bits) from memory with sign extension to 64 bits
ldw %r1,@c ; load c (32 bits) from memory with sign extension to 64 bits
addw %r2,%r0,%r1 ; add lower 32 bits of r0 and r1
stw @a,%r2 ; store lower 32 bits of r2 to a

If I define the ldw instruction faithfully according to the HW, that is, extending to 64 bits, it won't match the load i32. Does that mean I will need to define both 32 and 64 bit versions (via a multiclass perhaps)? Or would I just define the true (64-bit) version and use a Pattern to map 32-bit loads to the true instruction? Or is there something that would be done in lowering? I tried this lowering action:
   setOperationAction(ISD::LOAD, MVT::i32, Promote);
but got an assertion failure: "Can only promote loads to same size type"

Please forgive the elementary level of the question; we are just getting started and finding ISel a bit of a tough nut to crack.

-Alan Davis

So you want to make int 64 bit? Or you want int to stay 32 bit, long to be 64 bit but sign extend int 32 bit to 64?

Instead of
   setOperationAction(ISD::LOAD, MVT::i32, Promote);
   setOperationAction(ISD::LOAD, MVT::i32, Expand);

Does any register class have i32 as a legal type?


Hi Alan,

ARM64 is like this. I suggest having a look at that backend (lib/Target/AArch64) and how it deals with implicit zeroing of the upper bits of the X registers.


ARM64 has a separate name for the registers as 32-bit values though
(W0-W30 rather than X0-X30). I could easily see DAG ISel throwing a
fit without that.

First thing I'd try would be adding the 64-bit registers as a valid
class for i32 ("addRegisterClass(MVT::i32, GPR64)"). If that works,
you're good to go; if not, it should be possible add fake 32-bit
registers and just print them the same as the 64-bit ones at the end.
The nastiest hackery would be in the AsmParser, which may or may not
be important.



So you want to make int 64 bit? Or you want int to stay 32 bit, long to be 64 bit but sign extend int 32 bit to 64?

Int is 32, long is 64. Assuming 'int *p' the same HW load instruction covers int = *p and long = *p.

I can have separate names for the 32-bit subregisters, but then I think I would need two forms of the load, one to target each. I was hoping the selector would be smart enough to user wider instruction forms for narrower types when the upper bits are unneeded.

ARM64 is like this. I suggest having a look at that backend (lib/Target/AArch64) and how it deals with implicit zeroing of the upper bits of the X registers.

I did look a bit at AArch64. The difference is that AArch64 explicitly has 32-bit (W) and 64-bit (X) forms of ldr, and the .td appropriately models them as 2 instructions: ldrw and ldrx. In our case there is one ldw instruction, that loads 32 bits and sign/zero extends it. It would be used for either a 32-bit load, with the upper bits remaining unused, or a 32-to-64 sextload. If possible I'd like to model that with one instruction in the .td.

I will spend more time looking at AArch64 though, as I don't yet fully understand all the subtleties of extending, promoting, and so on.


I think the AArch64 situation is actually a lot closer than you think.
There are 3 relevant variants:

  * load 32-bits, zero extend into 64-bit register (AArch64 calls this
a plain LDRW since it's indistinguishable in effect from that).
  * load 32-bits, sign extend to 64 (LDRSW on AArch64).
  * load 64-bits (LDRX on AArch64).

The first one is what gets used for an i32 load, though if you wanted
to be perverse you could probably use any of them since the difference
is unobservable on 32-bits.

I assume you've got similar, which would just make it a matter of
naming the instructions. If you've actually combined LDRW and LDRSW
into a single instruction then presumably there's some second operand
that specifies whether to sign/zero extend, and you can set that
correctly when writing patterns.

The separate registers are definitely a bigger problem though. Mips
might actually be a better model there, since from my very hazy memory
it spells the 32 & 64-bit versions the same too. It also seems to have
separate 64-bit LLVM registers (e.g. V0 and V0_64).



This sounds quite similar to MIPS. MIPS64 supersets MIPS32 with additional rules
on how to extend 32 bit results for a 64 bit register.

For the MIPS64 backend in LLVM, we consider i64 and i32 legal types. This requires our
register classes to be 64 bit with 32 bit sub registers as Tom N. briefly described. I.E.
we have V0_64 which is the 64 bit register for returning a result, and V0, the
corresponding 32-bit subregister.

For the example you've given--which is familiar for MIPS64--by defining i32s as a legal
types and supplying tablegen instruction definitions that describe an i32 add which
takes a 32 bit register operands and returns a 32 bit result, LLVM can pattern match
the addition for the 32 bit case. Load and store instructions for a 32 bit value would
also be required.

For 64 bit operations, you'd want to define a register class that is 64 bit and supply
the corresponding instruction definitions. For your 32 bit sign extending load, you'd
want to define a load which has a dag pattern along the lines of:
(set GPR64Opnd:$rt, (sextloadi32 addr:$addr)) which maps to the same instruction
as your 32 bit load for a 32 bit register.

In short:
* Although your target is 64 bit, describe it has having 64 bit registers which have 32 bit
* Describe operations which operate on the lower 32 bits as only operating the
* Multiple instruction definitions will be required in select cases, e.g. MIPS has LW and
LW64--both describe the same instruction, but have different dag patterns and register