GlobalISel legalization guarantees

Hi all,

I've been thinking about what guarantees the legalization pass needs
to make in GlobalISel and how it should do it. Turns out it's really
quite difficult so I thought I'd send a message out to the list with
the current status and my thoughts for the future.

It got really quite long unfortunately, so the important bit is
probably the second section "High level solutions" on scary changes
that I can't see a way around.

Does anyone else either see issues I've missed with guaranteeing
legalization, or see simpler solutions (please please please)?

Cheers.

Tim.

Hi Tim,

Thanks for writing that up.

The way I initially envisioned it could be summarized in the following pseudo algorithm:
List IllegalOps = gatherAllIllegalOps;
List LegalizationArtifacts;
do {
  Assert(LegalizationArtifacts.empty());
  // Do the bulk of legalization.
  While (IllegalOps.empty()) {
    Instr I = IllegalOps.pop_back_val();
    // Legalizing an instruction may create some (less) illegal instructions and
    // LegalizationArtifact like trunc, anyext, merge, unmerge.
    Legalize(I, &IllegalOps, &LegalizationArtifacts);
  }
  // Clean-ups the artifacts.
  While (LegalizationArtifacts.empty()) {
     Instr I = LegalizationArtifacts.pop_back_val();
     // An artifact may not be cleaned up, in that case, it needs to be legalize
     CleanUpArtifact(I, &LegalizationArtifacts, &IllegalOps);
  }
} while(!IllegalOps.empty());

So roughly, legalize, then clean-up (via combine), and legalize again what’s left.
The clean-up part should move the artifacts (unmerge, anyext, and so on) past PHIs.
In the end, it should remain only things that are fed via legal instructions or fed legal instructions. Thus the set of things to legalize (via loads and stores, if the target wants something fancier, it custom lowers it) is much narrower and somewhat expected by the target.

Hi all,

I've been thinking about what guarantees the legalization pass needs
to make in GlobalISel and how it should do it. Turns out it's really
quite difficult so I thought I'd send a message out to the list with
the current status and my thoughts for the future.

It got really quite long unfortunately, so the important bit is
probably the second section "High level solutions" on scary changes
that I can't see a way around.

Does anyone else either see issues I've missed with guaranteeing
legalization, or see simpler solutions (please please please)?

Cheers.

Tim.

==========
Background

By some means we need the legalizer to be able to completely remove all illegal
types (a nebulous concept, but it should be obvious that you don't want to try
RegBankSelect or ISel on a s1234). However, each stepwise change of the
legalizer has the input and output VRegs of an instruction as its interface so
it cannot remove these types.

So, the bulk of legalization for larger types is expressed in terms of
G_MERGE_VALUES and G_UNMERGE_VALUES instructions, with the intent that a final
combining pass will be run after the main legalization to remove matching pairs
and leave the MachineFunction legal.

However, it does an incomplete job right now.

In ~630 cases (in the test-suite at -O0) we have a G_UNMERGE_VALUES but it can't
be matched up with G_MERGE_VALUES and removed:

600 G_UNMERGE hits a PHI

That’s fine, we should push them around.

30 G_UNMERGE hits an AArch64 intrinsic (e.g. @llvm.aarch64.ld2)

What does this mean for the lowering?

On the other hand about 8000 G_MERGE_VALUES remain after the combining that
don't match a G_UNMERGE_VALUES:

5300 used by a G_STORE
3090 used by a COPY (to physregs or dead?).
60 used by a G_EXTRACT
15 are dead

What is the total #instrs?
I’d like to have a sense of how big that is.

Also, how much of that could be avoided by splitting the structures up-front? (I expect this is what the G_STOREs come from.)
Why can’t we push the unmerge to the G_STORE/COPY by splitting them?

There are also more or less theoretical issues that don't arise in the
test-suite but need to be resolved. In particular even when a G_MERGE_VALUES is
consumed by a G_UNMERGE_VALUES the input types of the merge may not be the same
as the outputs of the unmerge. The only way to resolve this conflict is by
inserting explicit extract/insert instructions which may or may not themselves
be legal (at least in principle).

====================
High level solutions

Unless anyone comes up with more bright ideas, I think we can see the following
complications before we get to the state where the legalizer has a hope of
replacing SDAG with a straight face:

1. G_MERGE_VALUES and G_UNMERGE_VALUES will not always be eliminated and must be
  mappable+selectable to complex multi-instruction, multi-register
  copy/inserts.

That’s expected, and after the combines/clean-ups are done, the generic code should go with loads and stores.

We need to understand why we get so many though.

2. We must have a complex enough combiner in the legalizer pass alone to perform
  multi-step legalization intermixed with combines in the hope that we reach a
  fixed point.

Yup, that was the plan.
We should already support multi-step legalization, right?

Below is a bit of expanded detail on the issues found by the test-suite
above. I'm not quite sure how coherent it is, but it made sense to me so I can
probably try to reword if anything is unclear.

=========================================
Expanded description of test-suite issues

G_STORE and G_EXTRACT
---------------------

These are instances of a merge feeding legal instructions. I believe this has to
be permitted but it means that selection is going to contend with a wide range
of possible different merge instructions. At best all legal input and output
register classes must be handled.

On AArch64 the likely list is:

+ "mov vD.2d[0], dN; mov vD.2d[1], dM" (and s variants with 4 insts).
+ "mov vD.4s[0], wN, ..." (and d variants).
+ "ubfi xD, xN, #0, #32; ubfi xD, xM, #32, #32"
+ "COPY; COPY; COPY" or multiples of the above when stitching together a QQQ.

Machine intrinsics
------------------

The versions of this problem I've seen come from code like this::

{<2 x i32>, <2 x i32>, <2 x i32>} %val = call {...} @llvm.aarch64.ld3(%ptr)
%vec0 = extractelement {...} %val, 1

which becomes::

s192 = G_INTRINSIC_W_SIDE_EFFECTS …

Isn’t this lower into something like:
Vreg1, vreg2, vreg3 by call lowering.

s64, s64, s64 = G_UNMERGE_VALUES s192

Because intrinsics have to have sane types (they're not really legalized
currently, even in SDag) this is very much like the issue with G_STORE and
G_EXTRACT above. We probably have to support RegBankSelect and ISel on this kind
of G_UNMERGE_VALUES, which becomes a tractable but complex sequence of copies.

If that’s legal, it should indeed by supported by RBS and ISel. If not, then the legalizer should lower them.

PHIs
----

For example:

::
  s128 = PHI [%v0, BB#0], [%v1, BB#1]
  s64, s64 = G_UNMERGE_VALUES s128

I believe we have to split the PHI into sizes corresponding to the
G_UNMERGE_VALUES. If these correspond to the sizes of the (expected) merges on
the other side that's not too bad::

s64 = PHI [%v0.0, BB#0], [%v1.0, BB#1]
s64 = PHI [%v0.1, BB#0], [%v1.1, BB#1]

Yep.

On the other hand we also have the possibility of mismatching types here, with
the added fun that PHI instructions have to be the first in their block.

What’s the problem here.

I believe we'll have to insert messy extract/insert code into a separate,
fallthrough block just before this one. The main question is whether it should
be done as part of the main legalization loop or in the post-processing pass.

Clean-up part.

Since we have to deal with illegal extracts & inserts anyway, delaying PHIs
until everything else is decided seems reasonable to me.

Mismatching types in paired merge/unmerge
-----------------------------------------

Friendly example::

s384 = G_MERGE_VALUES s192, s192
s128, s128, s128 = G_UNMERGE_VALUES s384

Both of these instructions have to go (s384 is likely horribly illegal), but
they can't simply be combined away. The first thing to try is converting to
extracts & inserts::

s128 = G_EXTRACT s192, 0

s64 = G_EXTRACT s192, 128
s64 = G_EXTRACT s192, 0
s128 = G_IMPLICIT_DEF
s128 = G_INSERT s128, s64, 0
s128 = G_INSERT s128, s64, 64

s128 = G_EXTRACT s192, 64

I would go with just loads and stores:
Store s192
Store s192
s128 = load
s128 = load
s128 = load

The good side of this is that I believe it is capable of eliminating the bad
s384 type. The bad side is that we have no reason to expect the actual extract
and insert operations to be legal. Each one potentially corresponds to some
reasonably complex shifting and masking across multiple registers.

To get around this we could either demand that extract and insert **are** legal
whenever a type has an available register class, or attempt to legalize them in
turn (leading to more merges and a fixed-point at the end if we're lucky).

The only way I can see we end up in this situation is when the input IR has already weird (struct) bit cast. At this point, I’d say garbage in, garbage out.

Hope this helps.

Cheers,
-Quentin

=========================================
Expanded description of test-suite issues

G_STORE and G_EXTRACT
---------------------

These are instances of a merge feeding legal instructions. I believe this has to
be permitted but it means that selection is going to contend with a wide range
of possible different merge instructions. At best all legal input and output
register classes must be handled.

On AArch64 the likely list is:

   + "mov vD.2d[0], dN; mov vD.2d[1], dM" (and s variants with 4 insts).
   + "mov vD.4s[0], wN, ..." (and d variants).
   + "ubfi xD, xN, #0, #32; ubfi xD, xM, #32, #32"
   + "COPY; COPY; COPY" or multiples of the above when stitching together a QQQ.

You can use a stack temporary for all of these, and worry about optimizing it later.

Machine intrinsics
------------------

The versions of this problem I've seen come from code like this::

   {<2 x i32>, <2 x i32>, <2 x i32>} %val = call {...} @llvm.aarch64.ld3(%ptr)
   %vec0 = extractelement {...} %val, 1

which becomes::

   s192 = G_INTRINSIC_W_SIDE_EFFECTS ...
   s64, s64, s64 = G_UNMERGE_VALUES s192

Instead of keeping this form through legalization, could we just fix the intrinsic to return its natural type?

-Eli