New backend help request.

Hi all,

I'm trying to continue an existing m68k backend for LLVM. I'm completely new
to LLVM development so I've been muddling my way through mostly by trial and
error and using existing back ends for reference. I'm trying to implement
code to allow calling. I am compiling this C code --

typedef unsigned int uint32_t;
typedef char int8_t;

uint32_t foo(uint32_t x, int8_t y)
{
    return (x | (1 << 17)) + y;
}

int main()
{
  uint32_t b = foo(10,12);
}

And getting the following error from LLVM --

# Machine code for function main: Post SSA
Frame Objects:
  fi#0: size=4, align=4, at location [SP]
Function Live Outs: %D0L

BB#0: derived from LLVM BB %entry
        ADJCALLSTACKDOWN 6, %A7<imp-def>, %A7<imp-use>
        %vreg2<def> = COPY %A7; DR32:%vreg2
        %vreg1<def> = COPY %vreg2; AR:%vreg1 DR32:%vreg2
        MOVElim %vreg1<kill>, 0, 10; mem:ST4[<unknown>](align=2) AR:%vreg1
        %vreg2<def,tied1> = ADDlqd %vreg2<tied0>, 4, %CCR<imp-def,dead>;
DR32:%vreg2
        %vreg3<def> = COPY %vreg2; AR:%vreg3 DR32:%vreg2
        MOVEbim %vreg3<kill>, 0, 12; mem:ST1[<unknown>] AR:%vreg3
        CALL <ga:@foo>, %A7<imp-use>, ...
        ADJCALLSTACKUP 6, 0, %A7<imp-def>, %A7<imp-use>
        %vreg4<def> = COPY %D0L; DR32:%vreg4
        MOVEldm <fi#0>, 0, %vreg4<kill>; mem:ST4[%b] DR32:%vreg4
        %vreg5<def> = MOVEQ 0; DR32:%vreg5
        %D0L<def> = COPY %vreg5<kill>; DR32:%vreg5
        RTS %D0L<imp-use,kill>

# End machine code for function main.

*** Bad machine code: Using an undefined physical register ***
- function: main
- basic block: BB#0 entry (0xa19eac)
- instruction: %vreg4<def> = COPY %D0L; DR32:%vreg4
- operand 1: %D0L
LLVM ERROR: Found 1 machine code errors.
Stack dump:
0. Program arguments:
D:\RetroMods\Jaguar\Compiler\llvm-build\bin\Debug\llc
.exe -mtriple m68k-apple-mac test.ll
1. Running pass 'Function Pass Manager' on module 'test.ll'.
2. Running pass 'Greedy Register Allocator' on function '@main'

To me this looks like the result from the call (D0) is not being set
somewhere to let LLVM know the register is available when it tries to copy
it into the virtual register. I'm using the MSP430 target as reference, and
I'm doing everything this target is, so I'm not sure what I'm missing.

Any tips would be most appreciated.

Function Live Outs: %D0L

Live outs? How old are your sources?

See this:

commit e6dc59891fc53d65b3f6d19772d26e23e0cc1cac
Author: Jakob Stoklund Olesen <stoklund@2pi.dk>

     Remove liveout lists from MachineRegisterInfo.

     All targets are now adding return value registers as implicit uses on
     return instructions, and there is no longer a need for the live out
     lists.

     git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174417 91177308-0d3

-Krzysztof

The CALL instruction should have an implicit def of the register used to return a value.

-Krzysztof

Quite old -- I'm working with 3.2 at the moment as that's what the code I am
basing my work off is built with at the moment. I plan to try and add some
functionality and then update to 3.6 when I understand more.

I'm trying to figure out how to map more complex CISC instructions now. For
example on the 68000, you have things like --

add.w (a0)+,(a1)+

So that equates to:

temp1 = load a0
add 2, a0
temp2 = load a1
temp1 = add temp1, temp2
store temp1, a1
add 2, a1

How do I express that in a form for LLVM?

I see things like pre_store and post_store, but I cant find anything in the
way of documentation about this. And there doesn't appear to be a pre_load
and post_load matching pair or anything like that...

Thanks!

The simple answer is: not very easily. I’d be inclined to treat instructions like these as optimisations and ignore them (aside from integrated assembler support) until the rest of the back end is working. Once that’s working, take a look at how the ARM back end uses ldm and stm instructions - basically pattern matching on the MachineInstrs after code generation to fold them together.

Your other alternative in this case is to model this as an instruction that does a gather load of a two-element vector and then two extract elements. You might be able to get the vectoriser to generate these sequences and then match them, but I suspect that you’ll then have to define a load of pseudos for vector ops (type legalisation happens before instruction selection, so it’s difficult to have types that are only valid for a few complex IR / DAG sequences).

David

The simple answer is: not very easily. I’d be inclined to treat instructions like these as optimisations and ignore them (aside from integrated assembler support) until the rest of the back end is working. Once that’s working, take a look at how the ARM back end uses ldm and stm instructions - basically pattern matching on the MachineInstrs after code generation to fold them together.

If you want to match the expanded pattern and merge into an add.w,
then you can use table-gen pseudo instruction patterns. If the pattern
is not simple enough, or generally comes in a random sequence, or
needs additional checks (for example, "store t1, a1" must come
*before* "add 2,a1" but "add 2,a0" can come at any time after "load
a0"), you can do like ARM's LoadStoreOptimizer and fold it after
instruction selection.

Your other alternative in this case is to model this as an instruction that does a gather load of a two-element vector and then two extract elements. You might be able to get the vectoriser to generate these sequences and then match them, but I suspect that you’ll then have to define a load of pseudos for vector ops (type legalisation happens before instruction selection, so it’s difficult to have types that are only valid for a few complex IR / DAG sequences).

Yeah, you'll end up with a huge and complex list of pseudos that could
bite you in the rear if you're not careful.

--renato

What about things like pre_store and post_store, though? If there was a pre_load and post_load this would largely solve the problem. Of course there are a wealth of addressing modes for the 68k, but they should be able to be dealt with like this I think?

We do have write-back addressing modes in ARM, but IIRC, that's
encoded in the instruction, not as a pattern. Ie. it won't generate
(or collapse) additional write-back instructions, just choose the one
that does write-back.

--renato

Hmm, I'm getting nowhere pretty fast. It seems 68000 with its CISC nature is quite complex to implement for a novice.

I can see how to implement simple stuff, like --

move dn, dn
move dn, (an)

As that just turns into stores, sets, etc. But how would you represent things like indexed access?

move dn, (an,dn)
move dn, offset(an)

Can I only really define very simple operations for the main instruction info, and then the rest comes as optimisation? It sounds like that may not be very efficient?

As that just turns into stores, sets, etc. But how would you represent things like indexed access?

move dn, (an,dn)
move dn, offset(an)

These are fairly easy (the difficulty with the pre/post indexed modes
is that they tend to define two values or otherwise not be tree-like
in DAG form).

For these you just put in a more complicated pattern representing the
address calculation that happens. For example, with "(an, dn)" the
address is "an+dn" so you'd write your pattern as: "(store i32:$Dn,
(add i32:$An, i32:$Dm)". For the "Dn.W" variants you'll probably want
to throw a trunc in there.

And for the offset ones you'll want to define an "offsetimm" (or
whatever your preferred name is) instance of ImmLeaf with i32 type
(because addresses are 32-bits) but only accepting signed 16-bit
values that can actually be encoded:

def offsetimm : ImmLeaf<i32, [{ return Imm >= -32768 && Imm <= 32767; }]>;
[...]
(store i32:$Dn, (add i32:$An, offsetimm:$imm))

The biggest wrinkle is that you probably want to defer choosing
whether to use An or Dn as the offset as long as possible. You could
either fix that up later (optimisation!) or try to define these
instructions to accept *both* Dn and An and then encode it properly.
I'm afraid I don't know enough about the 68k instruction set to be
sure which is better.

Cheers.

Tim.

As I said before, there’s a big difference between generating working m68k code and generating code that makes optimal use of the instruction set. I’d aim to get the former working first - use fairly naive mappings to instructions - and then look at peephole optimisations once you have some concrete cases of suboptimal code. It’s quite easy to have SelectionDAG emit instructions that are correct and then clean them up when you’re in the SSA machine instruction form.

David

That's very helpful, thank-you!

I've seen immediate offsets handled through an Operand define. How on earth does it know to put an offset into simm16 below along with the address in the register? It seems to be a more implicit way of doing the immediate add pattern you suggested, but I have no idea how this actually works.

def simm16 : Operand<i32>;

def mem : Operand<i32>
{
  let MIOperandInfo = (ops AR, simm16);
  let PrintMethod = "printMemOperand";
}

I prefer the explicit pattern method suggested, as I can understand what's going on, but I would really like to know how the Operand stuff actually works.

Thanks!