Load/Store issues: tablegen/customization?

I've been running into two issues with load/store handling:

(1) is that tablegen doesn't seem to handle the two predicates that get attached to my instructions. The first is the predicate in TargetSelectionDAG.td, identifying a load node as, say, extloadi8. The second is my identification of the load as having a particular address space (need different instructions for different address spaces).

In the tablegen generated code, only one predicate is tested. Anybody seen anything similar to this (i.e. is this a known issue)? Do people have experience with multiple predicates working just fine?

I've been directly matching "ld" and been testing in a world that only does 32 bit aligned loads and things have "just worked", but I've got my sights set higher now (load of bytes, store of 4 x byte vectors, etc).

(2) The HW I'm targeting does not have byte/short load/store; the finest granularity is aligned 32 bit load/store (like the original Alpha architecture). In addition, I want to optimize certain vector operations (128 bit load/store) and some are getting converted from 32 bit to multiple 8 bit operations (e.g. store a 4 element vector of chars becomes 4 one byte stores), but I don't want to provide full vector register support (at this time).

Any hints on a good approach to dealing with the 32 bit aligned load/store limitations and mixing and matching "native" load/store support of vector types with LLVM generated expansions of vector operations?

Thanks,

Dan

I've been running into two issues with load/store handling:

(1) is that tablegen doesn't seem to handle the two predicates that
get attached to my instructions. The first is the predicate in
TargetSelectionDAG.td, identifying a load node as, say, extloadi8. The
second is my identification of the load as having a particular address
space (need different instructions for different address spaces).

In the tablegen generated code, only one predicate is tested. Anybody
seen anything similar to this (i.e. is this a known issue)? Do people
have experience with multiple predicates working just fine?

I've been directly matching "ld" and been testing in a world that only
does 32 bit aligned loads and things have "just worked", but I've got
my sights set higher now (load of bytes, store of 4 x byte vectors,
etc).

What version of LLVM are you using? I recently made some tablegen
changes to allow nodes to have multiple predicates. See r57565 for
an example of a change made possible by being able to have multiple
predicates on a node, and it sounds similar to what you describe.

(2) The HW I'm targeting does not have byte/short load/store; the
finest granularity is aligned 32 bit load/store (like the original
Alpha architecture). In addition, I want to optimize certain vector
operations (128 bit load/store) and some are getting converted from 32
bit to multiple 8 bit operations (e.g. store a 4 element vector of
chars becomes 4 one byte stores), but I don't want to provide full
vector register support (at this time).

Any hints on a good approach to dealing with the 32 bit aligned load/
store limitations and mixing and matching "native" load/store support
of vector types with LLVM generated expansions of vector operations?

For the 32-bit load/store question, I think you should mark
loads and stores of i8 and i16 as Custom, which will allow
you to write target-specific code to handle them. For the
vector question, it sounds like a target-specific DAGCombine
may be a possible solution.

Dan

What version of LLVM are you using? I recently made some tablegen
changes to allow nodes to have multiple predicates. See r57565 for
an example of a change made possible by being able to have multiple
predicates on a node, and it sounds similar to what you describe.

I retried this fairly recently, but it’s worth trying again.

(2) The HW I’m targeting does not have byte/short load/store; the

finest granularity is aligned 32 bit load/store (like the original

Alpha architecture). In addition, I want to optimize certain vector

operations (128 bit load/store) and some are getting converted from 32

bit to multiple 8 bit operations (e.g. store a 4 element vector of

chars becomes 4 one byte stores), but I don’t want to provide full

vector register support (at this time).

Any hints on a good approach to dealing with the 32 bit aligned load/

store limitations and mixing and matching “native” load/store support

of vector types with LLVM generated expansions of vector operations?

For the 32-bit load/store question, I think you should mark
loads and stores of i8 and i16 as Custom, which will allow
you to write target-specific code to handle them. For the
vector question, it sounds like a target-specific DAGCombine
may be a possible solution.

For custom load/stores, is there an existing sample I could use as a template?

I have been looking at the SPU code: It seems it generates a custom ISD node, LDRESULT, which it then custom selects code for.

I have not been able to successfully create a custom load node which I can then use a pattern match select on.

During selection, it tries to get info on the mem operand, and I that information doesn’t seem to exist (I create my original LDRESULT equivalent with a simple getNode, which I suspect doesn’t set up all the mem operand type information?).

In particular, I keep asserting in the getSizeInBits. The match calls getMemoryVT().getSizeInBits, which then goes to getVectorElementType.getSizeInBits(). The getMemoryVT() result seems to be where the garbage is introduced (the V value is way out of range), which is why I’m guessing it’s not being set up.

Once I get through this, I’ll look into the combining question…

Thanks,

Dan