[RFC] Overhauled MIR Patterns for GlobalISel Combiners

Pierre-vh · July 25, 2023, 10:38am

Introduction

This is a continuation of the work done on MatchTable-based GlobalISel Combiners: [RFC] MatchTable-based GlobalISel Combiners

Currently, MIR pattern support in GlobalISel combiners is very limited. We only support trivial match patterns, and we still need to fallback to C++ way too often. Apply patterns are non-existent, we only support C++ code there.

def bitcast_bitcast_fold : GICombineRule<
  (defs root:$dst),
  (match (G_BITCAST $dst, $src1):$op, (G_BITCAST $src1, $src0),
      [{ return MRI.getType(${src0}.getReg()) == MRI.getType(${dst}.getReg()); }]),
  (apply [{ Helper.replaceSingleDefInstWithReg(*${op}, ${src0}.getReg()); }])>;

The combine above is really an ‘ideal’ case, the vast majority of combines don’t even use MIR patterns. They use wip_match_opcode to just find an opcode, and do all the matching in C++ afterwards.

// Fold x op 1 -> x
def right_identity_one: GICombineRule<
  (defs root:$root),
  (match (wip_match_opcode G_MUL):$root,
    [{ return Helper.matchConstantOp(${root}->getOperand(2), 1); }]),
  (apply [{ Helper.replaceSingleDefInstWithOperand(*${root}, 1); }])
>;

// Fold (undef ? x : y) -> y
def select_undef_cmp: GICombineRule<
  (defs root:$root),
  (match (wip_match_opcode G_SELECT):$root,
    [{ return Helper.matchUndefSelectCmp(*${root}); }]),
  (apply [{ Helper.replaceSingleDefInstWithOperand(*${root}, 2); }])
>;

// Fold (freeze (freeze x)) -> (freeze x).
// Fold (fabs (fabs x)) -> (fabs x).
// Fold (fcanonicalize (fcanonicalize x)) -> (fcanonicalize x).
def idempotent_prop : GICombineRule<
   (defs root:$mi),
   (match (wip_match_opcode G_FREEZE, G_FABS, G_FCANONICALIZE):$mi,
          [{ return MRI.getVRegDef(${mi}->getOperand(1).getReg())->getOpcode() ==
                    ${mi}->getOpcode(); }]),
   (apply [{ Helper.replaceSingleDefInstWithOperand(*${mi}, 1); }])>;

This is not ideal:

It’s counter-intuitive.
- We’re using TableGen to write the combiners, but we need to embed C++ to do even the most basic transforms.
It’s a maintenance problem.
- Right now, adding even a basic target-independent combine implies modifying 3 files: Combine.td + CombinerHelper.h & CombinerHelper.cpp. We’re scattering information around.
- I would also argue that using C++ so often within TableGen is more bug prone. If there’s a build failure in a .inc file, you need to look at the generated code, check if the code is embedded from TableGen, then find the rule that contains it. This can be confusing, and it’s easier for bugs to hide.
Combines are more complicated than they should be.
- While working on this, I noticed that even some trivial combines aren’t as “simple” as they look, as they often have extra code to deal with specific cases (see the CSE example below for a concrete example).

IMO, this leaves the current implementation of combiners in an awkward state. We can see that there’s intention to get MIR patterns in someday, but it’s not making any progress and we’re stuck using “temporary” hacks forever.

The goal of this RFC is to propose a design for in/out MIR patterns to restore some momentum & interest in the GlobalISel combiners.

Proposal

The proposed implementation drastically overhauls the MatchTable-based GlobalISel backend to support match & apply MIR patterns. For instance, the above combines may be rewritten as:

// Fold x op 1 -> x
def right_identity_one: GICombineRule<
  (defs root:$dst),
  (match (G_MUL $dst, $x, 1)),
  (apply (COPY $dst, $x))
>;

// Fold (undef ? x : y) -> y
def select_undef_cmp: GICombineRule<
  (defs root:$dst),
  (match (G_IMPLICIT_DEF $undef),
         (G_SELECT $dst, $undef, $x, $y)),
  (apply (COPY $dst, $y))
>;

This has the following benefits:

(IMO) It’s more concise and intuitive.
- All of the information is in one place, and the intention is clear. There is no hidden behavior and no need to dig into C++ code to understand what we’re dealing with.
There is more room for optimizations.
- Giving all of the information to the MatchTable allows it to be much more efficient, e.g. it can hoist common checks outside the rule.
- We also remove the need to call C++ every now and then - we can just keep iterating the MatchTable which is - in theory - faster (but I haven’t measured - performance isn’t the main driver behind this change).

GICombinePatFrag

MIR patterns would be fairly limited without the ability to create macros; common patterns that can be reused across combines. This is where GICombinePatFrag comes in.

class GICombinePatFrag<dag outs, dag ins, list<dag> alts> {
  dag InOperands = ins;
  dag OutOperands = outs;
  list<dag> Alternatives = alts;
}

This class allows us to define a ‘meta-instruction’ that can be used to match non-trivial patterns.
It’s composed of:

zero or more in operands.
- Valid types: gi_mo for MachineOperands and gi_imm for immediates.
zero or more out operands (defs)
- If not empty, we need one root and the rest can just be gi_mo
A list of alternatives to match. When a GICombinePatFrag is used within a pattern, the pattern is cloned once for each alternative.

(note: untyped operands are implicitly inferred to be gi_mo)

GICombinePatFrag fit in Patterns just like normal instructions: the ‘out’ operands are written first, and the ‘ins’ are written after. Let’s start with a practical example:

// Fold (freeze (freeze x)) -> (freeze x).
// Fold (fabs (fabs x)) -> (fabs x).
// Fold (fcanonicalize (fcanonicalize x)) -> (fcanonicalize x).
def idempotent_prop_frags : GICombinePatFrag<(outs root:$dst, $src), (ins),
  [
    (pattern (G_FREEZE $dst, $src), (G_FREEZE $src, $x)),
    (pattern (G_FABS $dst, $src), (G_FABS $src, $x)),
    (pattern (G_FCANONICALIZE $dst, $src), (G_FCANONICALIZE $src, $x))
  ]
>;

def idempotent_prop : GICombineRule<
   (defs root:$dst),
   (match (idempotent_prop_frags $dst, $src)),
   (apply (COPY $dst, $src))>;

idempotent_prop_frags does not need any input operands and it has two output operands: a root $dst, which will be the starting point of the match, and another MachineOperand $src.

(note: when a MachineOperand is defined in the GICombinePatFrag’s patterns, it goes in the out operands, but if it’s just used to match another operand (check is same operand), it goes in the in).

Let’s take another, more convoluted artificial example to show the capabilities of this feature.

def MatchFooPerms: GICombinePatFrag<(outs root:$foo), (ins gi_imm:$cst),
    [
      (pattern (G_ZEXT $foo, $b), (G_TRUNC $b, $x):$dbg0, "return foo(${x}, ${cst})"),
      (pattern (G_TRUNC $foo, $z):$dbg1, "return bar(${foo}, ${cst})")
    ]>;

def Test0 : GICombineRule<
  (defs root:$dst),
  (match (G_AND $dst, $cst0, $tmp),
         (G_AND $tmp, $cst1, $cst2),
         (MatchFooPerms $cst0, (i32 10)):$a,
         (MatchFooPerms $cst1, (i32 20)):$b,
         (MatchFooPerms $cst2, (i32 30)):$c
  ),
  (apply (COPY $dst, (i32 0)), "APPLY ${cst0}")>;

C++ code is supported inside a GICombinePatFrag, much like it’s supported in match patterns.
gi_imm implies that the instruction must be called with an immediate. One cannot write MatchFooPerms $x, $y - $y needs to be an integer. ${cst} in the C++ code is expanded to that value directly.
When multiple GICombinePatFrags are used in one rule, every possible permutation of the fragments is emitted. Here, we have 3 uses of MatchFooPerms, so we have 2*2*2 = 8 copies the rule.
- Rules now have a MaxPermutations field to control how many permutation is too much. It can be easily overriden, it’s just here to force combine writers to acknowledge when a rule generates too many copies.

Limitations

GICombinePatFrag has a couple of limitations at the moment:

It cannot be used in an apply pattern.
Using a GICombinePatFrag inside a GICombinePatFrag is not possible.

Both are technically possible, but they add another layer of complexity which I haven’t tackled yet.
I think those use-cases are rare enough that it would be okay to accept those limitations, and fix them later. I’m open to implementing them sooner if it’s deemed necessary.

Constants

Constant matching can be typed or untyped:

(G_MUL $dst, $x, 1) just checks that the RHS of G_MUL is 1.
(G_MUL $dst, $x, (i32 1)) checks that the RHS of G_MUL is 1 and that the type is i32 (s32).

In both cases, this matches using getIConstantVRegValWithLookThrough so it looks for G_CONSTANT.
There is an hardcoded exception for instructions such as G_CONSTANT where it checks for a CImm instead. It’s not ideal but it works, and can be discussed during review.

Note that in apply patterns, immediates must always be typed. This is one of the areas that will need improvement because we only support emitting CImms, e.g. to rewrite a G_CONSTANT:

(apply (G_MUL $x, $y, (i32 0)))

Does not work as expected, because it’ll create a G_MUL with the RHS being an immediate, when a register is expected. We either need to emit a G_CONSTANT manually

(apply (G_CONSTANT $cst, (i32 0)), (G_MUL $x, $y, $cst))

or if we’ve matched (i32 0) in the match pattern, we can give that operand a name and reuse it:

(match (G_FOO $x, $y, (i32 0):$cst))
(apply (G_BAR $x, $y, $cst))

Known Issues

Matching Intrinsics

We currently don’t support matching intrinsics with something like (amdgcn_foo $x, $y).
It’s definitely possible (and actually easy), I just haven’t gotten around to it yet. I think it’ll be added latter as a smaller patch.

CImm vs Imm vs G_CONSTANT

As explained above, using an immediate in apply currently always generates a CImm.
I want to make it possible to emit normal immediates as well using untyped immediates in TableGen, but I’m not sure how to proceed with G_CONSTANT. Should these ever be emitted implicitly?
How could we control whether a G_CONSTANT needs to be emitted?

Idea: Instead of hardcoding the exception, we could add it to the GenericInstruction TableGen class (or Instruction directly).
This could take the form of a new type for the InOperandList/OutOperandList, or simply add a new opt-in field:

let CImmOperands = [$src];
let ImmOperands = [$idx];

The we could always emit G_CONSTANT by default, and use those fields to control when to change the behavior.

Thoughts?

CSE Issues

Some combines, like select_same_val use the matchEqualDefs helper.
This helper does not simply check for value equality. It also matches things like two separate invariant loads from the same value, or two identical G_EXTRACT_ELEMENT for instance.

It’s a problem because it prevents us from rewriting the combine as

def select_same_val: GICombineRule<
 (defs root:$dst),
 (match (G_SELECT $dst, $cond, $x, $x),
        (can_replace_reg $dst, $x)),
 (apply (COPY $dst, $x))

because the true/false operands of the G_SELECT can be different, but hold the same value. Using $x for both is too strict.

There are two possible solutions.
The ‘hacky’ one is to assign separate names to both operands, then call matchEqualDefs after.
It leaves C++ in the pattern, but it works.
We could perhaps figure out a clever TableGen syntax to hide the call into a MatchTable opcode as well.

The solution I would prefer is to consider this a CSE failure. I believe that if CSE did its job properly, matchEqualDefs would be irrelevant. Though, this seems too easy so I’m wondering if there is a historical reason why it behaves like this?

If not, I think it’s worth looking into eliminating matchEqualDefs completely and checking if all failing tests can be fixed by CSE improvements, or added combines.

COPY vs replaceReg

Notice how in all of the combines shown here, the following idiom is used:

(COPY $dst, $x)

This is because the MatchTable doesn’t have a concept of erase + replace a single-def instruction.
I’ve been going back-and-forth over this, and I think the (apply (COPY $dst, $x)) idiom is a good compromise. Maybe the COPY can be hidden-away by introducing a (apply $x) syntax if needed.

Would that be sufficient? It means we don’t check canReplaceReg anymore because it’s no longer needed. I haven’t observed any side effects of this change in the combines I’ve switched so far.

Another thing to note is that this may add some load on the combiner, as we’re giving it extra copies to fold away.

GISelChangeObserver

The current implementation notifies the GISelChangeObserver only once at the end by passing it all of the OutMIs (all the new instructions). It’s also notified when an instruction is erased.

Is this good enough? Should the observer be notified more granularly? (e.g. every time an opcode changes an instruction)

Potential Future Work

(These are just ideas on where this can be headed long term.)

Constraints

I think we will need a way to express constraints such as:

$x and $y must have the same type
$x and $y must have the same register bank
Output temporary register $tmp needs to have the same RC/Type as input $src

This is of course possible, but the question is: do we have any interest in this? How useful would it be?

ISel

Another possible continuation of this work is to extend those pattern to GISel itself, so we can finally implement GISel-only ISel Patterns instead of having to rely on DAG patterns import.

From a technical perspective this should “just work”, but it’ll require a bit of refactoring so the pattern parsing/emission is made generic and usable by both backends (currently it’s all in the .cpp file).

Looking into this is not in my plans yet. I first want to see how MIR patterns perform in the combiners.
Once they reach a good adoption rate and people are satisfied with their usability, we can look at introducing ISel support.

Next Steps

(Now) Gather Feedback

Please let me know your thought on this feature. Do you think it’s a good addition? Is there anything missing that’s absolutely needed ASAP? Do you have concerns?

Land the Initial Implementation

The first step will be to land the initial implementation of the feature, but no combines will be migrated at this stage.
It’s going to be a very big review (it’s a couple thousand lines) and I don’t want to rush it through.

The review will hopefully be up later this week or early next week. I have to clean up some things and write a doc page before it’s reviewable.

In the meantime, it’ll be available on my fork: GitHub - Pierre-vh/llvm-project at matchtable-mirpats

Migrate Combines Over Time

The next step will be to land a few diffs to rewrite some combines. This will not be done overnight, because each combine can pose some unique challenges.
For instance, the CSE issue above prevents a few combines from being migrated, and I’m sure we’ll find many other issues over time.

This will be the stage where help will be most welcome. Migrating all combines alone will be challenging, so I count on contributors to look into using MIR patterns first when adding new combines and filing issues for missing features/bugs.

Pierre-vh · July 26, 2023, 11:51am

Review is up: ⚙ D156315 [RFC][GlobalISel] Overhauled MIR Patterns Support for Combiners
It’s likely it’ll need quite a bit of iteration to reach a landable state.

Topic		Replies	Views
[RFC] MatchTable-based GlobalISel Combiners Common CodeGen Infrastructure globalisel , llvm	12	2137	July 19, 2023
[RFC] Tablegen-erated GlobalISel Combine Rules LLVM Dev List Archives	44	371	December 7, 2018
[RFC][GlobalISel] Replace the current GlobalISel matcher with a bottom-up matcher Code Generation globalisel	5	649	January 9, 2023
Cleanly Addressing GlobalISel/DAG Pattern Matching Differences Code Generation	18	1266	November 23, 2022
[RFC] Make GlobalISel Match Table entries 1 byte instead of 8 Common CodeGen Infrastructure globalisel , llvm	6	616	December 11, 2023