RFH: GlobalIsel Legalization Rules

I try to legalize:

...
---
name:            test_unmerge_192
body:             |
  bb.1:
    %0:_(s192) = IMPLICIT_DEF
    %1:_(s96), %2:_(s96) = G_UNMERGE_VALUES %0(s192)
    %3:_(s96) = COPY %1(s96)
    RET 0, implicit %3

It fails all the way from 32-bit to AVX2 X86.

  // extract multiple registers
  getActionDefinitionsBuilder(G_UNMERGE_VALUES)
    .legalIf([=](const LegalityQuery &Query) -> bool {
      if (typeInSet(0, {s16, s32, s64})(Query))
        return true;
      if (Is64Bit && typeInSet(0, {s128})(Query))
        return true;
      if (HasSSE1 && typeInSet(0, {v4s32, v2s64})(Query))
        return true;
      if (HasSSE2 && typeInSet(0, {v16s8, v32s8, v8s16, v16s16,
                                   v4s32, v8s32, v2s64, v4s64})(Query))
        return true;
      if (HasAVX && typeInSet(0, {v32s8, v64s8, v16s16, v32s16,
                                  v8s32, v16s32, v4s64, v8s64})(Query))
        return true;
      if (HasAVX2 && typeInSet(0, {v64s8, v32s16, v16s32, v8s64})(Query))
        return true;
      return false;
    })
    // elements (clamp and widen)
    .clampScalarOrElt(0, s16, sMaxScalar)
    .widenScalarOrEltToNextPow2(0, /*Min=*/16)
    .clampScalarOrElt(1, s16, sMaxScalar)
    .widenScalarOrEltToNextPow2(1, /*Min=*/16)
    // vectors (min and max)
    .clampMinNumElements(0, s16, bitWidth / 16)
    .clampMinNumElements(0, s32, bitWidth / 32)
    .clampMinNumElements(0, s64, bitWidth / 64)
    .clampMaxNumElements(0, s16,
                         HasAVX2 ? 32 :
                         (HasAVX ? 16 :
                          bitWidth / 16))
    .clampMaxNumElements(0, s32,
                         HasAVX2 ? 16 :
                         (HasAVX ? 8 :
                          bitWidth / 32))
    .clampMaxNumElements(0, s64,
                         HasAVX2 ? 8 :
                         (HasAVX ? 4 :
                          bitWidth / 64))
    // last resort
    .scalarize(0)
    .scalarize(1);

I am a bit worried that clampScalarOrElt does not work as expected or I am completely misunderstanding legalization. Any ideas?

Thanks!

Does -debug-only=legalizer show anything interesting?

Thanks for the fast reply, but no.

Legalize Machine IR for: test_unmerge_192
=== New Iteration ===
.. No debug info was present
Trying to combine: %1:_(s96), %2:_(s96) = G_UNMERGE_VALUES %0:_(s192)
.. Not combined, moving to instructions list
=== New Iteration ===
Legalizing: %1:_(s96), %2:_(s96) = G_UNMERGE_VALUES %0:_(s192)
.. Narrow scalar
.. Not legalized, moving to artifacts retry
No new artifacts created, not retrying!
LLVM ERROR: unable to legalize instruction: %1:_(s96), %2:_(s96) = G_UNMERGE_VALUES %0:_(s192) (in function: test_unmerge_192)

This is one of the major ways -debug-only is broken, is a lot of the actual information you want to see printed is not covered by the “legalizer” debug flag.

Legalizing: %1:_(s96), %2:_(s96) = G_UNMERGE_VALUES %0:_(s192)
Applying legalizer ruleset to: 64, Tys={s96, s192, }, Opcode=64, MMOs={}
.. fallback to legacy rules (no rules defined)
.. (legacy) Type 0 Action=Unsupported, s96
.. Unable to legalize
.. Not legalized, moving to artifacts retry
No new artifacts created, not retrying!

So this just isn’t covered by any rules

But I clamp typeIdx 0 and 1 to s16 - s32.

How did you get this output?

This is just full -debug

I removed all the legacy legalisers for G_UNMERGE_VALUES. There is only the builder that I showed above. My misunderstanding was that s192 is way too large and I clamp to s16-s32. I hoped that the legalizer would then take care of it.

I have another challenging example:

...
---
name:            test_unmerge_v4x16
body:             |
  bb.1:
    %0:_(<4 x s16>) = IMPLICIT_DEF
    %1:_(<2 x s16>), %2:_(<2 x s16>) = G_UNMERGE_VALUES %0(<4 x s16>)
    %3:_(<2 x s16>) = COPY %1(<2 x s16>)
    RET 0, implicit %3

It found fewer elements, but it ignores scalarize:

Legalizing: %1:_(<2 x s16>), %2:_(<2 x s16>) = G_UNMERGE_VALUES %0:_(<4 x s16>)
Applying legalizer ruleset to: 64, Tys={<2 x s16>, <4 x s16>, }, Opcode=64, MMOs={}
.. no match
.. no match
.. no match
.. no match
.. no match
.. no match
.. no match
.. no match
.. no match
.. no match
.. no match
.. no match
.. no match
.. match
.. .. FewerElements, 0, s16
.. Reduce number of elements
.. Not legalized, moving to artifacts retry
No new artifacts created, not retrying!
LLVM ERROR: unable to legalize instruction: %1:_(<2 x s16>), %2:_(<2 x s16>) = G_UNMERGE_VALUES %0:_(<4 x s16>) (in function: test_unmerge_v4x16)

You’re hitting this condition llvm-project/LegalizerHelper.cpp at 684f3c968d6bbf124014128b9f5e4f03a50f28c5 · llvm/llvm-project · GitHub

You’ve expressed it in terms of the first type index, instead of the second. This is a case where the API is too expressive for the instruction’s requirements, since the possible types for the first and second type index constrain each other. You would need to either extend this logic to figure out the types to use when expressed with the first type index, or change your rules to report a change on the second type index.

For an s192 on the right side of a G_UNMERGE_VALUES, I could have between 2 and up to 12 results. How can I know in the legaliser, the result must be unknown times smaller than the input type?

clampScalarOrEltWhilePreservingRelation(0, 1, s16, maxScalar)

AArch64 and AMDGPU might benefit as well.

Currently most of this stuff is just bounds checked in the LegalizerHelper implementations

At the moment, I can either clamp type 0 or 1, but never both. If there is a size relation between input and output, then I could violate it.

remark: unable to legalize instruction: %386:_(<4 x s32>), %387:_(<4 x s32>), %388:_(<4 x s32>), %389:_(<4 x s32>) = G_UNMERGE_VALUES %385:_(<16 x s32>) [-Rpass-missed=gisel-legalize]
remark: unable to legalize instruction: %13702:_(<4 x s64>) = G_CONCAT_VECTORS %20049:_(<2 x s64>), %20050:_(<2 x s64>) [-Rpass-missed=gisel-legalize]

In ran today in the same issue again on AArch64. I cannot apply rules to idx 1 and then 0 or the other way around. It breaks the contract between the two types indices.

There no plans for rules that update both type indices in one rule?

I don’t know of anyone working on this

I have seen examples where there are too large scalars on both sides and too large vectors.

.breakIntoSmallerScalars(/*max*/64)
.breakIntoSmallerVectors(/*max*/ 128)