Q about instruction pattern matching

Hi,
I'm trying to describe the patterns for the m68k instructions ADD and
ADDA when used with a data register operand for the source. Basically,
ADD operates on anything but address registers and immediates, and
ADDA works on address registers only so I'm going to need both
instructions in my instruction set.

These are the two problematic definitions; by themselves they produce
the intented effect. AR is the address register class, DR32 is the
data register class (no overlap):

// 32-bit add DR->DR
def ADD_32_dx_dx : I<(outs DR32:$dst), (ins DR32:$src1, DR32:$src2),
"add.l $src2, $dst", [(set DR32:$dst, (add DR32:$src2, DR32:$src1))]>;

// 32-bit add DR->AR
def ADDA_32_dx : I<(outs AR:$dst), (ins AR:$src1, DR32:$src2), "adda.l
$src2, $dst", [(set AR:$dst, (add AR:$src1, DR32:$src2))]>;

Tablegen tells me that "Pattern '(add:i32 DR32:i32:$src2,
DR32:i32:$src1)' is impossible to select", but I can't figure out why.
Are register classes not taken into account in the pattern matching or
am I doing something else wrong?

Thanks,
Andreas

ISel patterns are matched against DAGs before register allocation. So you are correct that ISel patterns can’t take into account register classes. The register classes in the patterns are a selection constraint for the register allocator if that instruction is chosen, not a constraint on choosing that pattern.

Here’s a possible way to proceed for now:
Define ADDA_32_dx without a pattern. Then in C++ pattern code for matching address operands select add operations on addresses to ADDA_32_dx. This way you know that the add your selecting is going to be used as an address and you can safely tell the register allocator to put the value in an address register.

Unfortunately dealing with heterogeneous register files like this is kind of annoying.

Thanks, I guess that makes sense. I might as well go ahead and create
a general-purpose register class but then require that some operations
only target the data registers. For instance, multiplication is going
to require data registers while indirect addressing requires address
registers.

Could something like the Uses and Defs constraints apply to whole
register classes as well? For instance, I assume the register
allocated is going to guarantee that EAX:EDX is not allocated around
an x86 integer division, and that the inputs end up in the right
registers. If those constraints "ripple up" through the function,
couldn't that be augmented to solve this problem as well so that
arithmetic would automatically select the data registers?

Thanks,
// A

ISel patterns are matched against DAGs before register allocation. So you
are correct that ISel patterns can’t take into account register classes. The
register classes in the patterns are a selection constraint for the register
allocator if that instruction is chosen, not a constraint on choosing that
pattern.

Here’s a possible way to proceed for now:
Define ADDA_32_dx without a pattern. Then in C++ pattern code for matching
address operands select add operations on addresses to ADDA_32_dx. This way
you know that the add your selecting is going to be used as an address and
you can safely tell the register allocator to put the value in an address
register.

Thanks, I guess that makes sense. I might as well go ahead and create
a general-purpose register class but then require that some operations
only target the data registers. For instance, multiplication is going
to require data registers while indirect addressing requires address
registers.

I’d start with the most straight forward implementation that generates correct code. The register allocator currently doesn’t handle these sorts of register allocation constraints in a sophisticated way, so I’d not worry about generating the most efficient code for now.

Could something like the Uses and Defs constraints apply to whole
register classes as well?

I don’t believe so. It’s simply a list of physical registers AFAIK.

For instance, I assume the register
allocated is going to guarantee that EAX:EDX is not allocated around
an x86 integer division, and that the inputs end up in the right
registers. If those constraints “ripple up” through the function,
couldn’t that be augmented to solve this problem as well so that
arithmetic would automatically select the data registers?

Protecting EAX:EDX is relatively straight forward, it’s having the inputs end up in the right registers that’s tricky.

I’m not an expert on the x86 back end, but my understanding of the register allocator is that it doesn’t handle these sorts of cascades of constraints. The constraints are completely specified by the instruction that’s selected, this means that a move instruction might need to be inserted to move a value between register classes or to specific registers. Though it’s less than globally optimal, adding this sort of affinity propagation would likely add another complex dimension to determining optimal register allocation.

Some options down the line would be to make more sophisticated patterns that select to instructions that operate on the right register classes from the start, or to have a custom machine instruction pass similar to x86’s operand folding which could swap instructions (say from ADD_32_dx_dx to ADDA_32_dx) and perform the kind of “ripple up” that you described, eliminating unnecessary register to register copies along the way.

ISel patterns are matched against DAGs before register allocation. So you
are correct that ISel patterns can’t take into account register classes. The
register classes in the patterns are a selection constraint for the register
allocator if that instruction is chosen, not a constraint on choosing that
pattern.

Here’s a possible way to proceed for now:
Define ADDA_32_dx without a pattern. Then in C++ pattern code for matching
address operands select add operations on addresses to ADDA_32_dx. This way
you know that the add your selecting is going to be used as an address and
you can safely tell the register allocator to put the value in an address
register.

This may work. But the mechanism is designed for load / store folding. What’s described here means folding the address computation to a separate instruction. It’s not clear how well this fits.

I am going to suggest something shocking. :slight_smile: Since you will end up writing a bunch of target specific code anyway, you might a well write a target specific pass that change generic instructions into data register variant ones when necessary.

Evan

Hi Evan,
wouldn't this generate fairly terrible code if each address register
use has to be preceded by instructions to make an address register
hold the right value?

I'm a bit confused here, wouldn't this run after the register
allocator, or is there a way to have the register allocator run after
I've gone through and "target-legalized" all instructions?

Thanks,
// A

I am going to suggest something shocking. :slight_smile: Since you will end up writing a
bunch of target specific code anyway, you might a well write a target
specific pass that change generic instructions into data register variant
ones when necessary.

Hi Evan,
wouldn't this generate fairly terrible code if each address register
use has to be preceded by instructions to make an address register
hold the right value?

No. I would suggest doing this as a instruction selection post pass. It would operate on DAGs so you still get the benefit of SDNode CSE, etc. Scheduling and register allocation happen later.

Let me clarify. Write "generic" instructions, i.e. those that use / def DR32, with patterns. So right after isel, all the DAG nodes will be of the dx variant, e.g. ADD_32_dx_dx. Also write AR instruction variants such as ADDA_32_dx. These do not have patterns so they aren't used during selection. Add a post pass to replace load / store operands by replacing them with identical nodes except for the "correct" opcodes.

I think this mechanism will work. There is probably a cleaner solution. But I am not seeing it right now.

Evan

Hi Evan,
as per your suggestion on IRC I've tried creating a "shadow group" of
registers in my data register class that aren't allocatable and map to
the address registers; this allows me to deal with everything as data
registers. Integer and FP code without loads and stores work fine with
this scheme, and simple memory operations using global addresses work
nicely as well.

There are a couple of gotchas I keep running into though;

1) My ABI requires the two first pointer arguments to go into address
registers A0, A1. I can't seem to express this in the cc td
description, because there are no MVTs for pointers. Is there a way
around that? This leads to pointer arguments entering functions in
data registers because they are simply i32's.

If I lower the calling convention stuff manually rather than relying
on the generated CC analyzer, how do I determine that a formal
argument I'm about to receive or pack together is supposed to be used
as a pointer so I can stick it in an address register?

2) How do I implement getPointerRegClass() in my TargetInstrInfo
subtype? Returning the proper thing (the address register class) makes
the instruction scheduler very unhappy and it asserts complaining that
the register classes don't agree (which they don't, obviously, because
everything has been selected as data register instructions).

Returning the data register class generates a bunch of illegal moves,
such as move.l 8(d0), d1. Here, d0 has to be an address register. The
root cause of this is instructions that get emitted with the
M_LOOK_UP_PTR_REG_CLASS flag, because I have a ptr_rc in my address
mode selection patterns for loads and stores.

Also, I'm uncertain as to how this custom DR->AR op replacer pass fits
into this whole soup and where it is supposed to happen, as I'm still
pretty clueless with llvm internals. :slight_smile:

Thanks,
Andreas

Hi Evan,
wouldn't this generate fairly terrible code if each address register
use has to be preceded by instructions to make an address register
hold the right value?

No. I would suggest doing this as a instruction selection post pass.
It would operate on DAGs so you still get the benefit of SDNode CSE,
etc. Scheduling and register allocation happen later.

Let me clarify. Write "generic" instructions, i.e. those that use /
def DR32, with patterns. So right after isel, all the DAG nodes will
be of the dx variant, e.g. ADD_32_dx_dx. Also write AR instruction
variants such as ADDA_32_dx. These do not have patterns so they
aren't used during selection. Add a post pass to replace load / store
operands by replacing them with identical nodes except for the
"correct" opcodes.

I think this mechanism will work. There is probably a cleaner
solution. But I am not seeing it right now.

Hi Evan,
as per your suggestion on IRC I've tried creating a "shadow group" of
registers in my data register class that aren't allocatable and map to
the address registers; this allows me to deal with everything as data
registers. Integer and FP code without loads and stores work fine with
this scheme, and simple memory operations using global addresses work
nicely as well.

Very nice.

There are a couple of gotchas I keep running into though;

1) My ABI requires the two first pointer arguments to go into address
registers A0, A1. I can't seem to express this in the cc td
description, because there are no MVTs for pointers. Is there a way
around that? This leads to pointer arguments entering functions in
data registers because they are simply i32's.

If I lower the calling convention stuff manually rather than relying
on the generated CC analyzer, how do I determine that a formal
argument I'm about to receive or pack together is supposed to be used
as a pointer so I can stick it in an address register?

I'd suggest custom lowering for now. Again use data registers during lowering and fix them during the isel post pass if the parameters are used as pointers. Do you think that would work?

2) How do I implement getPointerRegClass() in my TargetInstrInfo
subtype? Returning the proper thing (the address register class) makes
the instruction scheduler very unhappy and it asserts complaining that
the register classes don't agree (which they don't, obviously, because
everything has been selected as data register instructions).

Returning the data register class generates a bunch of illegal moves,
such as move.l 8(d0), d1. Here, d0 has to be an address register. The
root cause of this is instructions that get emitted with the
M_LOOK_UP_PTR_REG_CLASS flag, because I have a ptr_rc in my address
mode selection patterns for loads and stores.

I think it should return DR register class. My theory is if all the nodes are already fixed right after selection, scheduling and allocation should just work.

Also, I'm uncertain as to how this custom DR->AR op replacer pass fits
into this whole soup and where it is supposed to happen, as I'm still
pretty clueless with llvm internals. :slight_smile:

It should happen right after selection and before scheduling (i.e. in InstructionSelectionBasicBlock before it calls ScheduleAndEmitDAG). I would suggest you focusing on getting this down before you start worrying about the downstream passes.

Evan

I guess in a regparm ABI like the one I'm targeting, LLVM will produce
"live ins" for the registers that are used in the ABI in the entry
basic block of all functions? I've seen "Live ins: D0" appear for an
int (int)-type signature, so that seems right.

I'd like to arrange for e.g. a void foo(char*p) signature so see a
"live in A0", but can I custom lower to that immediately given that
I'm going to do a tapdance with the instruction selections later on?
If not, how do I distinguish these arguments from the data argument
later on?

Thanks,
Andreas

If I lower the calling convention stuff manually rather than relying
on the generated CC analyzer, how do I determine that a formal
argument I'm about to receive or pack together is supposed to be used
as a pointer so I can stick it in an address register?

I'd suggest custom lowering for now. Again use data registers during
lowering and fix them during the isel post pass if the parameters are
used as pointers. Do you think that would work?

I guess in a regparm ABI like the one I'm targeting, LLVM will produce
"live ins" for the registers that are used in the ABI in the entry
basic block of all functions? I've seen "Live ins: D0" appear for an
int (int)-type signature, so that seems right.

Right.

I'd like to arrange for e.g. a void foo(char*p) signature so see a
"live in A0", but can I custom lower to that immediately given that
I'm going to do a tapdance with the instruction selections later on?
If not, how do I distinguish these arguments from the data argument
later on?

Hrm. I suspect you'll have to add parameter attribute (see ParameterAttributes.h) that indicates it's used as an address. In SelectionDAGIsel.cpp:TargetLowering::LowerAguments(), you have access to the function argument type. Anyone sees an alternative solution?

Evan

For the uses you can always emit D->A register copies before the uses, and then optimize this. However, I don’t see any way around this for lowering arguments to a call.