Long-Term ISel Design

All,

As I've done more integrating of AVX work upstream and more tuning here,
I've run across several things which are clunky in the current isel
design. A couple examples I can remember offhand:

1. We have special target-specific operators for certain shuffles in X86,
   such as X86unpckl. I don't completely understand why but Bruno
   indicated it was to address inefficiecies. One of those is the need
   to check masks multiple times (once at legalize and again at isel).

2. Sometimes DAGs are legal in some contexts but not others and it is a
   pain to deal with. A good example is VBROADCAST, where a <0,0,0,0>
   shuffle is natively supported if the source vector is in memory.
   Otherwise it's not legal and manual lowering is required. In this
   case the legality check is doing the DAG match by hand, replicating
   what TableGen-produced code already does.

These two examples are related: we're duplicating functionality manually
that's already available automatically.

As I've been thinking about this, it strikes me that we could get rid of
the target-specific operators and a lot of other manual checks if we
just had another isel phase. Let's say we structured things this way:

                  legalize
                     >
                     V
        manual lowering (X86ISelLowering)
                     >
                     V
         manual isel (X86ISelDAGToDAG)
                     >
                     V
    table-driven isel (.td files/X86GenDAGISel)
                     >
                     V
      manual isel (some to-be-design piece)

The idea is that we keep the existing manual pieces where they are to
clean things up for TableGen-based isel and/or handle special cases.
Maybe we consider getting rid of some in the future but that's a
separate questions.

The way things are now, if table-driven isel fails the codegen aborts.
In the above scheme we get one last chance to do manual lowering before
we give up. This helps the shuffle mask case by turning this:

                             legalize
                                >
                                V
               check shuffle mask legality (X86ISelLowering)
                                >
                                V
       check shuffle mask legality (table-driven isel predicates)

To this:

                             legalize
                                >
                                V
              X86ISelLowering (no mask legality checks)
                                >
                                V
       check shuffle mask legality (table-driven isel predicates)
                                >
                                V
                 lower remaining shuffles manually

The advantage is that in the final stage we already know the shuffle
mask isn't implementable manually so there's no need to check for
legality. We simply need to implement whatever X86ISelLowering would
have done in those case previously.

This also helps example 2. In the memory-operand case we will match to
a VBROADCASTSS/D. If we don't match we'll fall through to manual
lowering and we'll implement the reg-reg broadcast via some other
combination of shuffles. So we more gracefully handle situations where
sometimes things are legal and sometimes they aren't depending on the
context.

Perhaps I'm repeating something that's already been discussed.

Thoughts?

                          -Dave

David Greene <dag@cray.com> writes:

The advantage is that in the final stage we already know the shuffle
mask isn't implementable manually so there's no need to check for
legality.

s/manually/natively/

                        -Dave

All,

As I've done more integrating of AVX work upstream and more tuning here,
I've run across several things which are clunky in the current isel
design. A couple examples I can remember offhand:

1. We have special target-specific operators for certain shuffles in X86,
  such as X86unpckl. I don't completely understand why but Bruno
  indicated it was to address inefficiecies. One of those is the need
  to check masks multiple times (once at legalize and again at isel).

It also eliminates a lot of fragility. Before doing this, X86 legalize would have to be very careful to specifically form shuffles that it knew isel would turn into (e.g.) unpck operations. Now instead of forming specific carefully constructed shuffle masks (not making sure other code doesn't violate them) it can just directly form the X86ISD node.

2. Sometimes DAGs are legal in some contexts but not others and it is a
  pain to deal with. A good example is VBROADCAST, where a <0,0,0,0>
  shuffle is natively supported if the source vector is in memory.
  Otherwise it's not legal and manual lowering is required. In this
  case the legality check is doing the DAG match by hand, replicating
  what TableGen-produced code already does.

Yes, this isn't good. Instead, the shuffle should be legalized to something that takes a pointer (memory operand). That means that X86 isel would form the *fully legal* X86ISD node, and nothing would be able to break it and it could never fail to match.

These two examples are related: we're duplicating functionality manually
that's already available automatically.

Not sure what you mean by this.

As I've been thinking about this, it strikes me that we could get rid of
the target-specific operators and a lot of other manual checks if we
just had another isel phase. Let's say we structured things this way:

                 legalize
                    >
                    V
       manual lowering (X86ISelLowering)
                    >
                    V
        manual isel (X86ISelDAGToDAG)
                    >
                    V
   table-driven isel (.td files/X86GenDAGISel)
                    >
                    V
     manual isel (some to-be-design piece)

The idea is that we keep the existing manual pieces where they are to
clean things up for TableGen-based isel and/or handle special cases.
Maybe we consider getting rid of some in the future but that's a
separate questions.

I'm not sure what you mean here. Are you suggesting that these be completely separate passes over the dag? Why do "manual isel" and "table driven isel" as separate passes? If they are interlaced, then how is this different than what we already have?

The way things are now, if table-driven isel fails the codegen aborts.
In the above scheme we get one last chance to do manual lowering before
we give up. This helps the shuffle mask case by turning this:

                            legalize
                               >
                               V
              check shuffle mask legality (X86ISelLowering)
                               >
                               V
      check shuffle mask legality (table-driven isel predicates)

You're saying that we do this on a node-by-node basis? The reason that codegen aborts on unselectable operations is that they are invalid and should not be formed. Your example of vbroadcast is a great one: the X86ISD node for it *should not take a vector register input*. If it does, then the X86ISD node is incorrectly defined.

-Chris

Chris Lattner <clattner@apple.com> writes:

1. We have special target-specific operators for certain shuffles in X86,
  such as X86unpckl.

It also eliminates a lot of fragility. Before doing this, X86
legalize would have to be very careful to specifically form shuffles
that it knew isel would turn into (e.g.) unpck operations. Now
instead of forming specific carefully constructed shuffle masks (not
making sure other code doesn't violate them) it can just directly form
the X86ISD node.

Right. What I've presented would reverse this. Rather than making
Legalize have to know about what table-driven isel can and cannot do,
have table-driven isel run first, see what it can do and then leave
the rest for manual selection.

We would still keep the existing pre-table-driven-isel passes so we'd
still have a chance to do some cleanup before the main table-driven
isel.

Obviously a lot of details have to be worked out.

2. Sometimes DAGs are legal in some contexts but not others and it is a
  pain to deal with. A good example is VBROADCAST, where a <0,0,0,0>
  shuffle is natively supported if the source vector is in memory.
  Otherwise it's not legal and manual lowering is required. In this
  case the legality check is doing the DAG match by hand, replicating
  what TableGen-produced code already does.

Yes, this isn't good. Instead, the shuffle should be legalized to
something that takes a pointer (memory operand). That means that X86
isel would form the *fully legal* X86ISD node, and nothing would be
able to break it and it could never fail to match.

Well, it dopesn't _have_to_ form an X86ISD node. I don't do that now.
But it's fragile in the sense that no one else should mess with that
piece of the DAG.

But the real point is that in forming the X86ISD node currently, I'm
doing exaclty what the tblgen-generated code already does. If the
shuffle doesn't take a memory operand, then I have to lower it to
something else. Where I do that (before or after table-driven isel)
doesn't matter. I do the same work either way. But by doing it after I
avoid writing duplicate DAG matching code in the case where the operand
is in memory.

These two examples are related: we're duplicating functionality manually
that's already available automatically.

Not sure what you mean by this.

I mean that in legalize/lowering we're massaging the DAG to get it into
a state where tabel-driven isel can match it. There is a lot of code
like this:

if (shuffle_is_MOVL)
  do_nothing_and_return

It's duplicating exactly the checks that the table-driven isel does
later. In the VBROADCASTSS/D case, it's doing an entire DAG match
to check whether it's implementable with VBROADCASTSS/D.

Why not just let table-driven isel run first and take care of these
checks just once? If something doesn't match, we then know it needs
manual lowering and selection.

                 legalize
                    >
                    V
       manual lowering (X86ISelLowering)
                    >
                    V
        manual isel (X86ISelDAGToDAG)
                    >
                    V
   table-driven isel (.td files/X86GenDAGISel)
                    >
                    V
     manual isel (some to-be-design piece)

I'm not sure what you mean here. Are you suggesting that these be
completely separate passes over the dag? Why do "manual isel" and
"table driven isel" as separate passes? If they are interlaced, then
how is this different than what we already have?

No, not as separate passes. Right now we have code like this in
X86ISelDAGToDAG:

X86DAGToDAGISel::Select(SDNode *Node) {
  ...
  switch (Opcode) {
    ...do a bunch of manual selection...
  }
  // If we get here we didn't select manually.
  SelectCode(); // Select via table-driven isel, abort if no match.
}

What I'm proposing is that we do this:

X86DAGToDAGISel::Select(SDNode *Node) {
  ...
  switch (Opcode) {
    ...do a bunch of manual selection, less than before...
  }

  // If we get here we didn't select manually.

  result = SelectCode(); // Select via table-driven isel.

  if (result_is_good()) {
    return;
  }

  switch (Opcode) {
    ...do a bunch of manual selection, some that used to be above and in
       legalize/lowering...
  }

  cannot_select_abort();
}

You're saying that we do this on a node-by-node basis?

You mean on an SDNode basis? I don't think so. As I said, details have
to be worked out but I imagine we'd send the selection DAG to the
tblgen-generated code as we do now and any "leftover" bits would have
to be processed afterward by the manual selector.

Now, there is a phase ordering issue in that some of the
legalize/lowering code massages the tree so the tblgen stuff can make a
"good" match. We probably still want to do that in some cases so we
keep that code where it is. What I'm aiming at getting rid of is all of
the code that does:

if (this_is_already_matchable()) {
  return;
}

The reason that codegen aborts on unselectable operations is that they
are invalid and should not be formed. Your example of vbroadcast is a
great one: the X86ISD node for it *should not take a vector register
input*. If it does, then the X86ISD node is incorrectly defined.

What I'm saying is that there would be no X86ISD node. If the pattern
is there, tblgen-produced code will match it. If not, we have to lower
it manually and would just do what we'd have done in legalize/lowering
before, the difference being we'd now do it after table-driven isel
rather than before.

                                 -Dave

Chris Lattner <clattner@apple.com> writes:

1. We have special target-specific operators for certain shuffles in X86,
such as X86unpckl.

It also eliminates a lot of fragility. Before doing this, X86
legalize would have to be very careful to specifically form shuffles
that it knew isel would turn into (e.g.) unpck operations. Now
instead of forming specific carefully constructed shuffle masks (not
making sure other code doesn't violate them) it can just directly form
the X86ISD node.

Right. What I've presented would reverse this. Rather than making
Legalize have to know about what table-driven isel can and cannot do,
have table-driven isel run first, see what it can do and then leave
the rest for manual selection.

We would still keep the existing pre-table-driven-isel passes so we'd
still have a chance to do some cleanup before the main table-driven
isel.

Obviously a lot of details have to be worked out.

I'm not seeing how this is useful for shuffles. Since tblgen doesn't generate table based matching for *any* shuffles, all of the matching code would end up as C++ code in X86ISelDagToDag, which would give us all of the problems we had before by moving to X86ISD nodes.

2. Sometimes DAGs are legal in some contexts but not others and it is a
pain to deal with. A good example is VBROADCAST, where a <0,0,0,0>
shuffle is natively supported if the source vector is in memory.
Otherwise it's not legal and manual lowering is required. In this
case the legality check is doing the DAG match by hand, replicating
what TableGen-produced code already does.

Yes, this isn't good. Instead, the shuffle should be legalized to
something that takes a pointer (memory operand). That means that X86
isel would form the *fully legal* X86ISD node, and nothing would be
able to break it and it could never fail to match.

Well, it dopesn't _have_to_ form an X86ISD node. I don't do that now.
But it's fragile in the sense that no one else should mess with that
piece of the DAG.

I don't consider that an acceptable approach. There is no way to prevent something from CSE'ing a load away and "breaking" the dag, or moving an add between the nodes etc. You're violating a design principle of selection dags.

But the real point is that in forming the X86ISD node currently, I'm
doing exaclty what the tblgen-generated code already does. If the
shuffle doesn't take a memory operand, then I have to lower it to
something else. Where I do that (before or after table-driven isel)
doesn't matter. I do the same work either way. But by doing it after I
avoid writing duplicate DAG matching code in the case where the operand
is in memory.

I don't agree, and I don't see why this is as bad as you're saying. The code creating these nodes is already target specific. You seem to be objecting to this because it is easier to write .td files (which turn into generated isel code) than it is to write legalize code in C++. If that is the problem you've identified, then why not make it possible to write legalize code in .td files?

These two examples are related: we're duplicating functionality manually
that's already available automatically.

Not sure what you mean by this.

I mean that in legalize/lowering we're massaging the DAG to get it into
a state where tabel-driven isel can match it. There is a lot of code
like this:

if (shuffle_is_MOVL)
do_nothing_and_return

It's duplicating exactly the checks that the table-driven isel does
later. In the VBROADCASTSS/D case, it's doing an entire DAG match
to check whether it's implementable with VBROADCASTSS/D.

Why not just let table-driven isel run first and take care of these
checks just once? If something doesn't match, we then know it needs
manual lowering and selection.

I think that this is just because the current code is in a half converted state. Bruno can say more, but the ultimate goal is to make ISD::SHUFFLE completely illegal on X86, so you'd never have this sort of thing.

                legalize
                   >
                   V
      manual lowering (X86ISelLowering)
                   >
                   V
       manual isel (X86ISelDAGToDAG)
                   >
                   V
  table-driven isel (.td files/X86GenDAGISel)
                   >
                   V
    manual isel (some to-be-design piece)

I'm not sure what you mean here. Are you suggesting that these be
completely separate passes over the dag? Why do "manual isel" and
"table driven isel" as separate passes? If they are interlaced, then
how is this different than what we already have?

No, not as separate passes. Right now we have code like this in
X86ISelDAGToDAG:

X86DAGToDAGISel::Select(SDNode *Node) {
...
switch (Opcode) {
   ...do a bunch of manual selection...
}
// If we get here we didn't select manually.
SelectCode(); // Select via table-driven isel, abort if no match.
}

What I'm proposing is that we do this:

X86DAGToDAGISel::Select(SDNode *Node) {
...
switch (Opcode) {
   ...do a bunch of manual selection, less than before...
}

// If we get here we didn't select manually.

result = SelectCode(); // Select via table-driven isel.

if (result_is_good()) {
   return;
}

switch (Opcode) {
   ...do a bunch of manual selection, some that used to be above and in
      legalize/lowering...
}

cannot_select_abort();
}

Ok, much better than separate passes, thanks for the clarification!

-Chris

Chris Lattner <clattner@apple.com> writes:

We would still keep the existing pre-table-driven-isel passes so we'd
still have a chance to do some cleanup before the main table-driven
isel.

Obviously a lot of details have to be worked out.

I'm not seeing how this is useful for shuffles. Since tblgen doesn't
generate table based matching for *any* shuffles, all of the matching
code would end up as C++ code in X86ISelDagToDag, which would give us
all of the problems we had before by moving to X86ISD nodes.

Actually, it would be matching code in X86ISelLowering, in the form
of is.*Mask. For example:

In X86ISelLowering.cpp:

bool X86ISelLowering::isSHUFPMask(...) {
  ...
}
unsigned X86::getShuffleSHUFImmediate(SDNode *N) {
  ...
}

In X86InstrFragmentsSIMD.td:

def SHUFFLE_get_shuf_imm : SDNodeXForm<vector_shuffle, [{
  return getI8Imm(X86::getShuffleSHUFImmediate(N));
}]>;
def shufp : PatFrag<(ops node:$lhs, node:$rhs),
                    (vector_shuffle node:$lhs, node:$rhs), [{
  return X86::isSHUFPMask(cast<ShuffleVectorSDNode>(N));
}], SHUFFLE_get_shuf_imm>;

Ok, so far this is exactly the same as today.

What we eliminate is this:

void X86TargetLowering::LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) {
  ...
  isShuffleMaskLegal(...)
}

bool X86TargetLowering::isShuffleMaskLegal(const SmallVectorImpl<int> &M,
                                           EVT VT) const {
  // FIXME: pshufb, blends, shifts.
  return (VT.getVectorNumElements() == 2 ||
          ShuffleVectorSDNode::isSplatMask(&M[0], VT) ||
          isMOVLMask(M, VT) ||
          isSHUFPMask(M, VT) ||
          ...
}

We git rid of this call to isSHUFPMask, which currently happens during
legalize. Instead of trying to see if shuffles are already legal, just
run the isntruction selector and see what it can find. For everything
else, we know by definition it's not legal and we have to manually
lower it somehow, just like today. The only difference is that we do
it after (or in conjunction with) isel instead of before it. This will
eliminate a lot of confusing and redundant code in X86ISelLowering.

Well, it dopesn't _have_to_ form an X86ISD node. I don't do that now.
But it's fragile in the sense that no one else should mess with that
piece of the DAG.

I don't consider that an acceptable approach. There is no way to
prevent something from CSE'ing a load away and "breaking" the dag, or
moving an add between the nodes etc. You're violating a design
principle of selection dags.

Fair enough. The current proposal eliminates any need to do any of
this. I did end up creating a special X86 node for VBROADCAST because
you're right, its too brittle to rely on something else not breaking it.

But the real point is that in forming the X86ISD node currently, I'm
doing exaclty what the tblgen-generated code already does. If the
shuffle doesn't take a memory operand, then I have to lower it to
something else. Where I do that (before or after table-driven isel)
doesn't matter. I do the same work either way. But by doing it after I
avoid writing duplicate DAG matching code in the case where the operand
is in memory.

I don't agree, and I don't see why this is as bad as you're saying.
The code creating these nodes is already target specific. You seem to
be objecting to this because it is easier to write .td files (which
turn into generated isel code) than it is to write legalize code in
C++. If that is the problem you've identified, then why not make it
possible to write legalize code in .td files?

I _am_ saying writing .td files is much easier, and I'm saying that half
the legalize code is already there, the part that checks whether stuff
is already legal. I'm not trying to make the manual lowering case
easier. I'm trying to avoid having to write manual code to discover
patterns that isel is already perfectly capable of discovering on its
own.

Take the broadcast case. The .td is something like this (theoretical,
because I did end up creating an X86broadcast node) :

(set VR128:$dst, (v8f32 (splat_lo (v4f32 scalar_to_vector (f32 load addr:$src)),
                                  (undef))))

It's fairly straightforward.

The way things are now, in legalize I have to look for broadcast-like
operations and make sure they come from memory, something like:

if (...we have avx, this is a splat from element 0, etc....) {
  if (...operand 2 is undef...)
    if (...operand 1 is a scalar_to_vector...)
      if (...scalar_to_vector operand is a load...)
         if (...the load only has one use and is foldable...)
           ...transform this node to an X86braodcast node...
}

then write the .td pattern as:

(set VR128:$dst, (v8f32 (X86vbroadcast (f32 load addr:$src))))

In addition, I have to write the code to lower a splat_lo-type
thing that comes from a register (can't use VBROADCAST) but I
have to do that no matter what since it's not a legal DAG.

I had to write a whole bunch of manual matching code. This is
error-prone is completely automatable by tblegen. The current
difficulty is that the tblegen-generated matcher runs too late, after we
have a legal DAG guarantee. If it ran earlier and instead of aborting
simply returned the DAGs (or fragments) it could not match, I could
avoid writing all this nasty code.

Using TableGen to generate legalize code (the other half that really has
to morph nodes) is an idea I've tossed around from time to time. I ran
into this need very early on. It would be nice to write patterns like:

def : Pat<(...some dag...),
          (...some other, non-target-specific dag...)>;

I think there's a baked-in assumption in TableGen that the pattern
matched to is a machine instruction dag. The memory is fuzzy but I
remember trying to do something like this and failing utterly. :slight_smile:

I think that this is just because the current code is in a half
converted state. Bruno can say more, but the ultimate goal is to make
ISD::SHUFFLE completely illegal on X86, so you'd never have this sort
of thing.

Erm...how would that work exactly? Manual matching and lowering for all
shuffles? That sounds like a huge step backward. Shuffle is a key
operation on X86. Anything that can automate its selection is a huge
win in terms of correctness. After writing all of the shuffle matching
code for AVX (on its way into trunk), I don't ever, ever, ever want to
have to do anything like that ever again, ever. :slight_smile:

                                  -Dave

Chris Lattner <clattner@apple.com> writes:

We would still keep the existing pre-table-driven-isel passes so we'd
still have a chance to do some cleanup before the main table-driven
isel.

Obviously a lot of details have to be worked out.

I'm not seeing how this is useful for shuffles. Since tblgen doesn't
generate table based matching for *any* shuffles, all of the matching
code would end up as C++ code in X86ISelDagToDag, which would give us
all of the problems we had before by moving to X86ISD nodes.

bool X86TargetLowering::isShuffleMaskLegal(const SmallVectorImpl<int> &M,
                                          EVT VT) const {
// FIXME: pshufb, blends, shifts.
return (VT.getVectorNumElements() == 2 ||
         ShuffleVectorSDNode::isSplatMask(&M[0], VT) ||
         isMOVLMask(M, VT) ||
         isSHUFPMask(M, VT) ||
         ...
}

We git rid of this call to isSHUFPMask, which currently happens during
legalize. Instead of trying to see if shuffles are already legal, just
run the isntruction selector and see what it can find. For everything
else, we know by definition it's not legal and we have to manually
lower it somehow, just like today. The only difference is that we do
it after (or in conjunction with) isel instead of before it. This will
eliminate a lot of confusing and redundant code in X86ISelLowering.

I don't really see where you're going with this. I agree that there is confusing and fragile code for shuffle lowering, but this is what Bruno is working on fixing. To me, the problem is that legalize of SHUFFLE_VECTOR eliminates shuffles that are not directly matchable into a single machine instruction, but preserves ones that do match. This means that there is duplicated code in both legalize and isel that has to know that a shuffle is an "unpacklps" or whatever. Other parts of the code that generate shuffles generally know exactly what shuffle they want, and have to synthesize a shuffle node with the right operands to get it to select into the right operation.

To me at least, the right solution to this problem is to make one X86ISD node for each legal shuffle. This has several desirable properties:

1. Legalize is now responsible for eliminating ALL vector_shuffle nodes, and it is really obvious what shuffles it is generating when reading the dag dumps.
2. Other code that generates shuffles can just generate their exact node.
3. The duplicated isel code is gone, because there is a simple mapping between X86ISD nodes and machine instrs.
4. We push fewer shuffle masks through the compiler which is a marginal speedup.

To me, there are two missing pieces here:
1. The x86 backend is half way converted over to the new model.
2. We *really really* want a way to express shuffle masks in .td files, and tblgen should generate the Legalize code. The "def X86Punpcklbw" should include a (per cpu) cost for the shuffle as well as the mask that it matches, or a predicate to run if it the mask isn't constant.

I think that this is just because the current code is in a half
converted state. Bruno can say more, but the ultimate goal is to make
ISD::SHUFFLE completely illegal on X86, so you'd never have this sort
of thing.

Erm...how would that work exactly? Manual matching and lowering for all
shuffles? That sounds like a huge step backward.

I'm not sure how it is a step backward. Legalize already does matching to "know and ignore" a shuffle that will match a legal instruction. It isn't hard to change that code to "know and transform" it to the instruction. The net result of this is that the duplicate code in isel goes away.

Shuffle is a key
operation on X86. Anything that can automate its selection is a huge
win in terms of correctness. After writing all of the shuffle matching
code for AVX (on its way into trunk), I don't ever, ever, ever want to
have to do anything like that ever again, ever. :slight_smile:

Yes, I certainly agree that tblgen should generate the shuffle matching code, I just think that the generated code should be executed in the LegalizeOp(shuffle vector) code.

-Chris

Chris Lattner <clattner@apple.com> writes:

To me at least, the right solution to this problem is to make one
X86ISD node for each legal shuffle. This has several desirable
properties:

1. Legalize is now responsible for eliminating ALL vector_shuffle
   nodes, and it is really obvious what shuffles it is generating when
   reading the dag dumps.
2. Other code that generates shuffles can just generate their exact
   node.
3. The duplicated isel code is gone, because there is a simple mapping
   between X86ISD nodes and machine instrs.
4. We push fewer shuffle masks through the compiler which is a marginal speedup.

It also has some not-so-desirable properties:

1. We need special isel DAG nodes for each shuffle. That's an extra
   maintenance cost for something that should be automatable. More
   shuffle instructions are coming, I'm sure. I'm worried this won't
   scale.

2. It makes shuffle nodes special relative to other kinds of nodes. Why
   not have special DAG operators for _every_ kind of machine
   instruction? I know some have them and that's always been a source
   of confusion for me.

2. We *really really* want a way to express shuffle masks in .td
files, and tblgen should generate the Legalize code. The "def
X86Punpcklbw" should include a (per cpu) cost for the shuffle as well
as the mask that it matches, or a predicate to run if it the mask
isn't constant.

I agree this is highly desireable. What's the difference between
generated legalize code and a generated instruction selection matcher?

Can we somehow work this so we don't need special target machine DAG
nodes? If we're generating legalize code to lower to special target
machine DAG nodes and then matching really simple patterns using those
nodes, it seems to me just as easy to generate code to lower to machine
instructions directly.

Using target-specific DAG nodes as an intermediate step is fine, but I
don't think we want to stop there.

I think that this is just because the current code is in a half
converted state. Bruno can say more, but the ultimate goal is to make
ISD::SHUFFLE completely illegal on X86, so you'd never have this sort
of thing.

Erm...how would that work exactly? Manual matching and lowering for all
shuffles? That sounds like a huge step backward.

I'm not sure how it is a step backward. Legalize already does
matching to "know and ignore" a shuffle that will match a legal
instruction. It isn't hard to change that code to "know and
transform" it to the instruction. The net result of this is that the
duplicate code in isel goes away.

If we can support masks in TableGen and auto-generate this it is
goodness.

Yes, I certainly agree that tblgen should generate the shuffle
matching code, I just think that the generated code should be executed
in the LegalizeOp(shuffle vector) code.

I guess as long as it's automatic I don't particularly care where it
gets done. :slight_smile:

Is this plan written up anywhere? I was really confused by the changes
in x86 isel and it would have been nice to have something to look at.

                                 -Dave