Passing return values on the stack & storing arbitrary sized integers

Hi everybody,

thanks to kind help of the LLVM-community I was able to bring my
TriCore back-end a huge step forward, however I am not done, so far. I
still miss the following features and maybe you could again provide me
some help:

1. Passing return values on the stack

Describing the calling conventions in tablegen so that first registers
are used and to fall back to the stack if these do not suffice.
However, this is not enough and lowering calls and returns have to
reflect this, too. Currently, most targets also do not support this
(there is assertion: assert(VA.isRegLoc() && "Can only return in
registers!")).

How important is this feature? Is it save to ignore it? Is there some
guide how to implement such a hybrid passing of return values (partly
in registers, partly on the stack)? Currently, the TriCore back-end is
not able to compile functions returning e.g. {i128,i1}.

2. Storing arbitrary sized integers

The testcase "test/CodeGen/Generic/APIntLoadStore.ll" checks for
loading/storing e.g. i33 integers from/into global variable. The
questions are the same as regarding feature 1: How important is this
feature? Is it save to ignore it? Is there some guide how to implement
this?

When looking for some kind of guide, I am not looking for step-by-step
instructions to implement these issues, but a well documented back-end
would be really helpful, here. While being the most complete target,
the X86 target IMO is much too complex for that purpose.

BTW: are the testcases in test/CodeGen/Generic supposed to be
"must-haves" for all LLVM-targets or can/should I ignore some of them?

Ciao, Fabian

Hi everybody,

thanks to kind help of the LLVM-community I was able to bring my
TriCore back-end a huge step forward, however I am not done, so far. I
still miss the following features and maybe you could again provide me
some help:

1. Passing return values on the stack

Describing the calling conventions in tablegen so that first registers
are used and to fall back to the stack if these do not suffice.
However, this is not enough and lowering calls and returns have to
reflect this, too. Currently, most targets also do not support this
(there is assertion: assert(VA.isRegLoc() && "Can only return in
registers!")).

How important is this feature? Is it save to ignore it? Is there some
guide how to implement such a hybrid passing of return values (partly
in registers, partly on the stack)? Currently, the TriCore back-end is
not able to compile functions returning e.g. {i128,i1}.

This isn't very important; you won't run into it compiling C code.

2. Storing arbitrary sized integers

The testcase "test/CodeGen/Generic/APIntLoadStore.ll" checks for
loading/storing e.g. i33 integers from/into global variable. The
questions are the same as regarding feature 1: How important is this
feature? Is it save to ignore it? Is there some guide how to implement
this?

If you're using the LLVM CodeGen infrastructure and have everything
else implemented correctly, this should be taken care of for you. We
have infrastructure generally referred to as "legalization" that will
transform this into something sane for your target automatically. I
would suggest not ignoring this because the optimizers will
occasionally generate unusual loads and stores.

When looking for some kind of guide, I am not looking for step-by-step
instructions to implement these issues, but a well documented back-end
would be really helpful, here. While being the most complete target,
the X86 target IMO is much too complex for that purpose.

BTW: are the testcases in test/CodeGen/Generic supposed to be
"must-haves" for all LLVM-targets or can/should I ignore some of them?

How important they are probably depends on your target... it's really
a grab-bag of unrelated tests which hasn't been updated in a long
time. If any of them crash, though, it's probably worth looking into.

-Eli

Hi Eli,

thank you for the information.

thanks to kind help of the LLVM-community I was able to bring my
TriCore back-end a huge step forward, however I am not done, so far. I
still miss the following features and maybe you could again provide me
some help:

1. Passing return values on the stack

Describing the calling conventions in tablegen so that first registers
are used and to fall back to the stack if these do not suffice.
However, this is not enough and lowering calls and returns have to
reflect this, too. Currently, most targets also do not support this
(there is assertion: assert(VA.isRegLoc() && "Can only return in
registers!")).

How important is this feature? Is it save to ignore it? Is there some
guide how to implement such a hybrid passing of return values (partly
in registers, partly on the stack)? Currently, the TriCore back-end is
not able to compile functions returning e.g. {i128,i1}.

This isn't very important; you won't run into it compiling C code.

OK, fine :slight_smile:

2. Storing arbitrary sized integers

The testcase "test/CodeGen/Generic/APIntLoadStore.ll" checks for
loading/storing e.g. i33 integers from/into global variable. The
questions are the same as regarding feature 1: How important is this
feature? Is it save to ignore it? Is there some guide how to implement
this?

If you're using the LLVM CodeGen infrastructure and have everything
else implemented correctly, this should be taken care of for you. We
have infrastructure generally referred to as "legalization" that will
transform this into something sane for your target automatically. I
would suggest not ignoring this because the optimizers will
occasionally generate unusual loads and stores.

Hm, my problem is that the TriCore does not really support i64 only
paired 32.bit registers, but I need such a register class as some
instructions require them. So, the Legalizer thinks i64-instructions
are legal and integer types above i32 are not legalized automatically.
For the most operations I used setOperationAction, setLoadExtAction,
... and now I have to handle loads/stores for i33. Maybe you can guide
me, where I shall look at inside LLVM how to do that.

Ciao, Fabian

I'm not entirely sure why, but this seems to be a very frequent
mistake: don't mark i64 legal unless you actually have i64 registers.
Lying to the legalizer creates extra work for your target, and you're
using codepaths which aren't well tested. There are better ways to
model a pair of i32 registers; if you have some case you're having
trouble modeling, please ask.

-Eli

Hi Eli,

2. Storing arbitrary sized integers

The testcase "test/CodeGen/Generic/APIntLoadStore.ll" checks for
loading/storing e.g. i33 integers from/into global variable. The
questions are the same as regarding feature 1: How important is this
feature? Is it save to ignore it? Is there some guide how to implement
this?

If you're using the LLVM CodeGen infrastructure and have everything
else implemented correctly, this should be taken care of for you. We
have infrastructure generally referred to as "legalization" that will
transform this into something sane for your target automatically. I
would suggest not ignoring this because the optimizers will
occasionally generate unusual loads and stores.

Hm, my problem is that the TriCore does not really support i64 only
paired 32.bit registers, but I need such a register class as some
instructions require them. So, the Legalizer thinks i64-instructions
are legal and integer types above i32 are not legalized automatically.
For the most operations I used setOperationAction, setLoadExtAction,
... and now I have to handle loads/stores for i33. Maybe you can guide
me, where I shall look at inside LLVM how to do that.

I'm not entirely sure why, but this seems to be a very frequent
mistake: don't mark i64 legal unless you actually have i64 registers.
Lying to the legalizer creates extra work for your target, and you're
using codepaths which aren't well tested. There are better ways to
model a pair of i32 registers; if you have some case you're having
trouble modeling, please ask.

well, I did not implement the back-end at first, I am currently only
adapting it to LLVM 3.1, so I don't know if it was possible to model
the TriCore ISA in a better way. The TriCore supports register pairs
of two adjacent (even,odd) registers forming one 64-bit registers. A
few operation of the TriCore exploit these register pairs:
multiplication, for instance, places its result in on of those
register pairs, it it possible to load/store such register pairs
directly from/to memory and the calling conventions take them into
account, too. Everything else has to be done using 32-bit registers
and instructions.

In the first place, these registers pairs were only present in the
tablegen descriptions and were only used to define the
multiplication-instruction. Furthermore, the calling conventions were
tweaked to be compatible to those specified by the TriCore EABI
(64-bit arguments have to passed in an appropriate register pair).
Everything worked quite fine. When moving to LLVM 3.1, however, the
generic code generation framework complained that it knew nothing
about those registers pairs that were only described in tablegen when
selecting the multiplication-instruction. So, I found no other
solution than to add these register pairs as i64-registers.

If there is a more convenient solution for this setup, I would be
really glad to learn about it :slight_smile:

Ciao, Fabian

Not sure exactly what's going wrong, but if the Tablegen selection
code doesn't work correctly for a few unusual instructions, you can
write out the code by hand in your target's implementation of
SelectionDAGISel::Select.

-Eli

OK, I rechecked my problem and I hope I can now describe it more precisely:

1. The TriCore supports register pairs forming 64.bit registers. These
64-bit registers are defined like this and form the "ER" register
class:

def E0 : TriCoreRegWithSubregs<0, "e0", [D0, D1]>, DwarfRegNum<[32]>;
...
def E14 : TriCoreRegWithSubregs<14, "e14", [D14, D15]>, DwarfRegNum<[39]>;

2. The TriCore has some instructions that make use of an arbitrary
register pair, for example integer division (consisting of a single
DVINIT and several DVSTEPs):

def DVINIT_Urr : RrInstr<0x4b, 0x0a, (outs ER:$c), (ins DR:$a,
DR:$b), "dvinit.u\t$c, $a, $b", >;
def DVSTEP_Urrr : RrrInstr<0x6b, 0x0e, (outs ER:$c), (ins ER:$d,
DR:$b), "dvstep.u\t$c, $d, $b", >;

def : Pat<(sdiv DR:$a, DR:$b),
              (EXTRACT_SUBREG
                (DVSTEPrrr
                  (DVSTEPrrr
                    (DVSTEPrrr
                      (DVSTEPrrr (DVINITrr DR:$a, DR:$b), DR:$b),
                      DR:$b),
                    DR:$b),
                  DR:$b),
                sub_even)>;

3. These are selected in a simple testcase:

define i32 @div(i32 %a, i32 %b) nounwind readnone {
entry:
  %div = sdiv i32 %a, %b ; <i32> [#uses=1]
  ret i32 %div
}

4. Instruction Scheduling calls GetCostForDef (in
ScheduleDAGRRList.cpp) when hitting the EXTRACT_SUBREG-Node introduced
by the Pattern above.

5. GetCostForDef crashes here, as getRepRegClassFor(VT /* == MVT::i64
*/) returns NULL:

RegClass = TLI->getRepRegClassFor(VT)->getID();

For LLVM versions before 3.1 this did not happen (I don't know why).
Solving this problem at the tablegen level means (I guess) not to use
an explicitly modelled "ER" register class at all, right? But how can
I describe that for example DVSTEP could use any of those registers
pairs as input and output? I have no clue how I can express this
constraint in tablegen without that ER register class or when manually
lowering ISD::SDIV during ISelLowering.

Ciao, Fabian

This isn't really my area of expertise, but I think you're messing up
your RegisterClass definition. Look at how ARM defines DTriple.

-Eli

This isn't really my area of expertise, but I think you're messing up
your RegisterClass definition. Look at how ARM defines DTriple.

DTriple is untyped :slight_smile: , because we do not have any valut type which
defines 3xi64.
However, the paired register needs to have type.

Fabian, what are the definitions of ER and DR register classes?

Hi Anton,

here are the definitions of these register classes:

// Data register class
def DR : RegisterClass<"TriCore", [i32], 32,
                       (add D0, D1, D2, D3, D4, D5, D6, D7,
                            D8, D9, D10, D11, D12, D13, D14, D15)>;

// Extended-size data register class
def ER : RegisterClass<"TriCore", [i64], 32,
                       (add E0, E2, E4, E6, E8, E10, E12, E14)> {
  let SubRegClasses = [(DR sub_even, sub_odd)];
}

And the DX and EX registers are defined this way:

def D0 : TriCoreReg<0, "d0">, DwarfRegNum<[0]>;
...
def D15 : TriCoreReg<15, "d15">, DwarfRegNum<[15]>;

def E0 : TriCoreRegWithSubregs<0, "e0", [D0, D1]>, DwarfRegNum<[32]>;
def E2 : TriCoreRegWithSubregs<2, "e2", [D2, D3]>, DwarfRegNum<[33]>;
...
def E14 : TriCoreRegWithSubregs<14, "e14", [D14, D15]>, DwarfRegNum<[39]>;

Ciao, Fabian

Fabian,

here are the definitions of these register classes:

// Data register class
def DR : RegisterClass<"TriCore", [i32], 32,
                       (add D0, D1, D2, D3, D4, D5, D6, D7,
                            D8, D9, D10, D11, D12, D13, D14, D15)>;

// Extended-size data register class
def ER : RegisterClass<"TriCore", [i64], 32,
                       (add E0, E2, E4, E6, E8, E10, E12, E14)> {
  let SubRegClasses = [(DR sub_even, sub_odd)];
}

And the DX and EX registers are defined this way:

The regclasses look fine... So, you need to figure out why
getRepRegClassFor() returns NULL in this case.
Side note: you can autogenerate register names :slight_smile:

here are the definitions of these register classes:

// Data register class
def DR : RegisterClass<"TriCore", [i32], 32,
                       (add D0, D1, D2, D3, D4, D5, D6, D7,
                            D8, D9, D10, D11, D12, D13, D14, D15)>;

// Extended-size data register class
def ER : RegisterClass<"TriCore", [i64], 32,
                       (add E0, E2, E4, E6, E8, E10, E12, E14)> {
  let SubRegClasses = [(DR sub_even, sub_odd)];
}

And the DX and EX registers are defined this way:

The regclasses look fine... So, you need to figure out why
getRepRegClassFor() returns NULL in this case.

Well, that's rather easy :slight_smile: The register class is not registered in
the constructor of TriCoreTargetLowering. Maybe, some background is
missing here:

- I added the ER register class for MVT::i64 and I had to take care of
quite a lot of stuff as the TriCore does not really support 64-bit
operations (it just offers these register pairs but almost no
operations working on them).

- Eli mentioned that is a quite common mistake to register such
register classes although the processor does not support many
operations on the according value types. It causes much more work in
the specific back-end as type legalization no longer takes care of
such nodes. It definitely would be much easier if I just had to take
care of some special instructions on the TriCore that involve ER
registers (like multiplication/division).

- However, the segfault caused by the NULL-pointer returned by
getRepRegClassFor() is the reason why I added this register class and
used a lot of setOperationActions, setLoadExtAction, ... or lowered
some thins manually.

Side note: you can autogenerate register names :slight_smile:

Nice, saves typing. Is there an example I can have a look at. Most
targets I examined seem to do this explicitly.

Ciao, Fabian

Hi Fabian, Anton,

here are the definitions of these register classes:

// Data register class
def DR : RegisterClass<"TriCore", [i32], 32,
                        (add D0, D1, D2, D3, D4, D5, D6, D7,
                             D8, D9, D10, D11, D12, D13, D14, D15)>;

// Extended-size data register class
def ER : RegisterClass<"TriCore", [i64], 32,
                        (add E0, E2, E4, E6, E8, E10, E12, E14)> {
   let SubRegClasses = [(DR sub_even, sub_odd)];
}

And the DX and EX registers are defined this way:

The regclasses look fine... So, you need to figure out why
getRepRegClassFor() returns NULL in this case.

Well, that's rather easy :slight_smile: The register class is not registered in
the constructor of TriCoreTargetLowering. Maybe, some background is
missing here:

- I added the ER register class for MVT::i64 and I had to take care of
quite a lot of stuff as the TriCore does not really support 64-bit
operations (it just offers these register pairs but almost no
operations working on them).

- Eli mentioned that is a quite common mistake to register such
register classes although the processor does not support many
operations on the according value types. It causes much more work in
the specific back-end as type legalization no longer takes care of
such nodes. It definitely would be much easier if I just had to take
care of some special instructions on the TriCore that involve ER
registers (like multiplication/division).

That's *a lot* of work as your processor does not support any operation other than mul/div in 64 bit.

Did you try to create a pseudo div/mul instruction and expand it after the isel pass ?
Or you may even go further in the pipeline and expand it just before the RA with a custom pass.
Not sure if that hook is called again at the RA pass or later on.

Ivan

OK, the might be a solution, I did not try this yet. Is
EmitInstrWithCustomInserter the right point to start? When emitting a
bunch of target specific instructions there, I will still need some
virtual registers of the ER register class there. Is this possible
without adding the register class?

Ciao, Fabian

Hi Fabian, Anton,

here are the definitions of these register classes:

// Data register class
def DR : RegisterClass<"TriCore", [i32], 32,
                         (add D0, D1, D2, D3, D4, D5, D6, D7,
                              D8, D9, D10, D11, D12, D13, D14, D15)>;

// Extended-size data register class
def ER : RegisterClass<"TriCore", [i64], 32,
                         (add E0, E2, E4, E6, E8, E10, E12, E14)> {
    let SubRegClasses = [(DR sub_even, sub_odd)];
}

And the DX and EX registers are defined this way:

The regclasses look fine... So, you need to figure out why
getRepRegClassFor() returns NULL in this case.

Well, that's rather easy :slight_smile: The register class is not registered in
the constructor of TriCoreTargetLowering. Maybe, some background is
missing here:

- I added the ER register class for MVT::i64 and I had to take care of
quite a lot of stuff as the TriCore does not really support 64-bit
operations (it just offers these register pairs but almost no
operations working on them).

- Eli mentioned that is a quite common mistake to register such
register classes although the processor does not support many
operations on the according value types. It causes much more work in
the specific back-end as type legalization no longer takes care of
such nodes. It definitely would be much easier if I just had to take
care of some special instructions on the TriCore that involve ER
registers (like multiplication/division).

That's *a lot* of work as your processor does not support any operation
other than mul/div in 64 bit.

Did you try to create a pseudo div/mul instruction and expand it after the
isel pass ?
Or you may even go further in the pipeline and expand it just before the RA
with a custom pass.
Not sure if that hook is called again at the RA pass or later on.

OK, the might be a solution, I did not try this yet. Is
EmitInstrWithCustomInserter the right point to start?

Yes.

When emitting a
bunch of target specific instructions there, I will still need some
virtual registers of the ER register class there. Is this possible
without adding the register class?

Yes as long as you have the regclass defined in the td file. MachineRegisterInfo doesn't seem to use TargetLowering information.

Ivan