Multi-Instruction Patterns

Are there any examples of using tablegen to generate multiple machine
instructions from a single pattern? Or do these cases always have to be
manually expanded?

                                            -Dave

PPC has a bunch of examples, for example:

// Arbitrary immediate support. Implement in terms of LIS/ORI.
def : Pat<(i32 imm:$imm),
           (ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>;

// ADD an arbitrary immediate.
def : Pat<(add GPRC:$in, imm:$imm),
           (ADDIS (ADDI GPRC:$in, (LO16 imm:$imm)), (HA16 imm:$imm))>;
// OR an arbitrary immediate.
def : Pat<(or GPRC:$in, imm:$imm),
           (ORIS (ORI GPRC:$in, (LO16 imm:$imm)), (HI16 imm:$imm))>;
// XOR an arbitrary immediate.
def : Pat<(xor GPRC:$in, imm:$imm),
           (XORIS (XORI GPRC:$in, (LO16 imm:$imm)), (HI16 imm:$imm))>;

-Chris

Chris Lattner wrote:

Are there any examples of using tablegen to generate multiple machine
instructions from a single pattern? Or do these cases always have to be
manually expanded?

PPC has a bunch of examples, for example:

// Arbitrary immediate support. Implement in terms of LIS/ORI.
def : Pat<(i32 imm:$imm),
           (ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>;

Yep, I actually found some x86 ones buried in the .td files. :slight_smile:

So now I have a couple of other questions.

I wrote a pattern that looks something like the above in form, but how
do I tell the selection DAG to prefer my pattern over another that
already exists. I can't easily just disable that other pattern because
it generates Machine Instruction opcode enums that are assumed to be
available in other parts of the x86 codegen.

So given two patterns that match the same thing, what's the tiebreaker?
I thought it was order in the .td file but that doesn't appear to be the
case. I put my pattern first and it isn't selected. I change the other pattern slightly so it won't match anything and then my pattern gets
used (so I know my pattern is valid).

Also, I really wanted to express this pattern as transforming from one
DAG to another, not down to machine instructions. I saw this in
x86InstSSE.td:

// FIXME: may not be able to eliminate this movss with coalescing the src and
// dest register classes are different. We really want to write this pattern
// like this:
// def : Pat<(f32 (vector_extract (v4f32 VR128:$src), (iPTR 0))),
// (f32 FR32:$src)>;

(this is actually a very useful and important pattern, I wish it was
available!)

I had actually written my pattern in a similar style before I found
this. When I tried to build, tblgen complained about the pattern being
of an unknown type (didn't match Instruction or SDNodeXForm).

I'm assuming there's some missing tblgen support here and that's why
this pattern is commented out. Is that right? Are there plans to add
tblgen support for these kinds of patterns?

                                 -Dave

Chris Lattner wrote:

Are there any examples of using tablegen to generate multiple machine
instructions from a single pattern? Or do these cases always have
to be
manually expanded?

PPC has a bunch of examples, for example:

// Arbitrary immediate support. Implement in terms of LIS/ORI.
def : Pat<(i32 imm:$imm),
          (ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>;

Yep, I actually found some x86 ones buried in the .td files. :slight_smile:

So now I have a couple of other questions.

I wrote a pattern that looks something like the above in form, but how
do I tell the selection DAG to prefer my pattern over another that
already exists. I can't easily just disable that other pattern because
it generates Machine Instruction opcode enums that are assumed to be
available in other parts of the x86 codegen.

Try AddedComplexity = n to increase "goodness" of the pattern. It's a bit of a hack.

So given two patterns that match the same thing, what's the tiebreaker?
I thought it was order in the .td file but that doesn't appear to be the
case. I put my pattern first and it isn't selected. I change the other
pattern slightly so it won't match anything and then my pattern gets
used (so I know my pattern is valid).

Also, I really wanted to express this pattern as transforming from one
DAG to another, not down to machine instructions. I saw this in
x86InstSSE.td:

// FIXME: may not be able to eliminate this movss with coalescing the
src and
// dest register classes are different. We really want to write this pattern
// like this:
// def : Pat<(f32 (vector_extract (v4f32 VR128:$src), (iPTR 0))),
// (f32 FR32:$src)>;

(this is actually a very useful and important pattern, I wish it was
available!)

Right. It would be nice to be able to eliminate the unnecessary movss. It hasn't shown up on my radar so I haven't really thought out the right way to model this. I can see a couple of options:

1. Treat these instructions as cross register class copies. The src and dst classes are different (VR128 and FR32) but "compatible".
2. Model it as extract_subreg which coalescer can eliminate.

#2 is conceptually correct. The problem is 128 bit XMM0 is the same register as 32 bit (or 64 bit) XMM0. So it's not possible to define the super-register / sub-register relationship.

That leaves us with #1. I have added support to coalesce cross-class copies (see CrossClassJoin in SimpleRegisterCoalescing.cpp). Unfortunately, it breaks a few tests and I haven't had the time to look into them. If that's done, we just need to add the concept of "compatible register classes" and mark MOVPS2SSrr as a copy and *it should just work*.

Evan

> I wrote a pattern that looks something like the above in form, but how
> do I tell the selection DAG to prefer my pattern over another that
> already exists. I can't easily just disable that other pattern
> because
> it generates Machine Instruction opcode enums that are assumed to be
> available in other parts of the x86 codegen.

Try AddedComplexity = n to increase "goodness" of the pattern. It's a
bit of a hack.

Ah, AddedComplexity == goodness. I thought it == badness so I'd tried to add
it to the pattern I wanted to disable. Thanks for the pointer.

1. Treat these instructions as cross register class copies. The src
and dst classes are different (VR128 and FR32) but "compatible".

This seems reasonable.

2. Model it as extract_subreg which coalescer can eliminate.

#2 is conceptually correct. The problem is 128 bit XMM0 is the same
register as 32 bit (or 64 bit) XMM0. So it's not possible to define
the super-register / sub-register relationship.

I'm not seeing how this is "conceptually correct." It's a vector extract, not
a subregister. It's just that we want to reuse the same register.

Perhaps the answer is to add vector extract support to the coalescer, in
the same way you added subregister support. I don't understand the nitty
gritty of that, though.

That leaves us with #1. I have added support to coalesce cross-class
copies (see CrossClassJoin in SimpleRegisterCoalescing.cpp).

Yep.

Unfortunately, it breaks a few tests and I haven't had the time to
look into them. If that's done, we just need to add the concept of
"compatible register classes" and mark MOVPS2SSrr as a copy and *it
should just work*.

What about getting tblgen support for the pattern in the .td file? That would
be another way to tackle this and would open up a whole bunch of other
opportunities. Instcombine could be entirely expressed as a set of tblgen
patterns, for example, which is desireable from a maintenance perspective
and well as new development. It's much easier to write patterns than to go
through all of the manual examination that currently exists in instcombine.

                                                         -Dave

1. Treat these instructions as cross register class copies. The src
and dst classes are different (VR128 and FR32) but "compatible".

This seems reasonable.

2. Model it as extract_subreg which coalescer can eliminate.

#2 is conceptually correct. The problem is 128 bit XMM0 is the same
register as 32 bit (or 64 bit) XMM0. So it's not possible to define
the super-register / sub-register relationship.

I'm not seeing how this is "conceptually correct." It's a vector extract, not
a subregister. It's just that we want to reuse the same register.

It is though. Sub-register is a machine specific concept. It means vector_extract can be modeled as subreg_extract on this machine. Nothing is wrong with thatt.

Perhaps the answer is to add vector extract support to the coalescer, in
the same way you added subregister support. I don't understand the nitty
gritty of that, though.

I don't think that's a good idea. Conceptually vector_extract is very different from a move.

That leaves us with #1. I have added support to coalesce cross-class
copies (see CrossClassJoin in SimpleRegisterCoalescing.cpp).

Yep.

As Dan pointed out, #2 is also a workable solution.

Unfortunately, it breaks a few tests and I haven't had the time to
look into them. If that's done, we just need to add the concept of
"compatible register classes" and mark MOVPS2SSrr as a copy and *it
should just work*.

What about getting tblgen support for the pattern in the .td file? That would
be another way to tackle this and would open up a whole bunch of other
opportunities. Instcombine could be entirely expressed as a set of tblgen
patterns, for example, which is desireable from a maintenance perspective
and well as new development. It's much easier to write patterns than to go
through all of the manual examination that currently exists in instcombine.

I don't think that would work. We still have to model the value as being produced by an instruction. So either we have to select it to a target specific instruction or a target independent instruction (i.e. extract_subreg).

After thinking about this some more, I think #2 is a better solution. Adding XMM0_32 etc. teaches codegen that only lower 32-bits of the registers are used. Perhaps this can open up additional optimization opportunities. On the other hand, adding these registers means introducing more aliasing which has compile time implication.

Evan

> I'm not seeing how this is "conceptually correct." It's a vector
> extract, not
> a subregister. It's just that we want to reuse the same register.

It is though. Sub-register is a machine specific concept. It means
vector_extract can be modeled as subreg_extract on this machine.
Nothing is wrong with thatt.

I didn't mean to imply anything was "wrong." It just strikes me as kind of
strange, in a mind-warping kind of way. :slight_smile:

> Perhaps the answer is to add vector extract support to the
> coalescer, in
> the same way you added subregister support. I don't understand the
> nitty
> gritty of that, though.

I don't think that's a good idea. Conceptually vector_extract is very
different from a move.

Ok, I agree.

>> That leaves us with #1. I have added support to coalesce cross-class
>> copies (see CrossClassJoin in SimpleRegisterCoalescing.cpp).
>
> Yep.

As Dan pointed out, #2 is also a workable solution.

Yes, I like Dan's proposal.

> What about getting tblgen support for the pattern in the .td file?
> That would
> be another way to tackle this and would open up a whole bunch of other
> opportunities. Instcombine could be entirely expressed as a set of
> tblgen
> patterns, for example, which is desireable from a maintenance
> perspective
> and well as new development. It's much easier to write patterns
> than to go
> through all of the manual examination that currently exists in
> instcombine.

I don't think that would work. We still have to model the value as
being produced by an instruction. So either we have to select it to a
target specific instruction or a target independent instruction (i.e.
extract_subreg).

That's right. The pattern doesn't know if the rest of the vector is going to
be used elsewhere so we need dataflow information and that implies it needs to
be done in the coalescer (or some other transformation pass).

After thinking about this some more, I think #2 is a better solution.
Adding XMM0_32 etc. teaches codegen that only lower 32-bits of the
registers are used. Perhaps this can open up additional optimization
opportunities. On the other hand, adding these registers means
introducing more aliasing which has compile time implication.

What about XMM0_64? What about things like AVX which applies the GPR aliasing
scheme to vectors? I think this is the right way to go but we need to do
things in a comprehensive way so we can expand as needed.

                                                   -Dave

As Dan pointed out, #2 is also a workable solution.

Yes, I like Dan's proposal.

Hmmm... Perhaps Dan should implement this. :slight_smile:

What about getting tblgen support for the pattern in the .td file?
That would
be another way to tackle this and would open up a whole bunch of other
opportunities. Instcombine could be entirely expressed as a set of
tblgen
patterns, for example, which is desireable from a maintenance
perspective
and well as new development. It's much easier to write patterns
than to go
through all of the manual examination that currently exists in
instcombine.

I don't think that would work. We still have to model the value as
being produced by an instruction. So either we have to select it to a
target specific instruction or a target independent instruction (i.e.
extract_subreg).

That's right. The pattern doesn't know if the rest of the vector is going to
be used elsewhere so we need dataflow information and that implies it needs to
be done in the coalescer (or some other transformation pass).

After thinking about this some more, I think #2 is a better solution.
Adding XMM0_32 etc. teaches codegen that only lower 32-bits of the
registers are used. Perhaps this can open up additional optimization
opportunities. On the other hand, adding these registers means
introducing more aliasing which has compile time implication.

What about XMM0_64? What about things like AVX which applies the GPR aliasing
scheme to vectors? I think this is the right way to go but we need to do
things in a comprehensive way so we can expand as needed.

They will be handled the same way. Please file a bug. We'll get to it.

Thanks,

Evan