[RFC] MatchTable-based GlobalISel Combiners

Pierre-vh · June 19, 2023, 6:51am

Introduction

Much like InstCombine for IR, and DAGCombine for DAGISel, the GlobalISel Combiners are one of the most important parts of GlobalISel and are the source of many important generic and target-specific optimizations.

However, the current system on which combiner rules are matched & applied leaves room for improvement. Moreover, I feel like the implementation is getting stale; contributions are irregular (most commits are maintenance commits) and there doesn’t seem to be any effort to improve the QoL of combine rule writers lately.

Lastly, the implementation in its current state has several weaknesses that are difficult, if not impossible to completely address:

It does not support any kind of non-trivial MIR pattern matching, and certainly doesn’t support MIR-to-MIR patterns.
- This means we implement most of the matching & rewriting in C++ currently, which is (IMO) verbose and hacky.
- AFAIK, supporting MIR-to-MIR patterns with the current combiner implementation would be a very big effort. I’m not sure how it would work with the MatchDAG, and generating C++ code for everything would be (IMO) painful.
There is very little innovation in the Combiner’s TableGen (backend), and I believe it’s slowing GISel infrastructure progress.
- Combiners (and ISel in general) would greatly benefit from better MIR pattern matching, and it was pitched a long time ago, but never upstreamed/implemented as far as I know.
The implementation is very ad-hoc and shares little to no code with other parts of the GISel infrastructure.
- That is not really a weakness in itself depending on your opinion. IMHO, I consider it a weakness because it makes maintenance more difficult. I believe it’s one of the reasons why progress in this area is slow: if a new contributor wants to improve the system, they need to learn a lot of boilerplate before being able to make any change. e.g. I recently tried to add intrinsics matching (⚙ D150666 [WIP][GlobalISel] Combiner Intrinsic Matching) and got stuck on some odd “unreachable leaf” error, and it’s very difficult to figure out how to fix it.
The generated code is sub optimal.
- Now this is very much a nitpick (TableGen-erated code is often spaghetti no matter what), but the code emitted by the backend is convoluted and could use some clean-ups/optimizations.
- There is also a feature to “disable rules” which seems interesting, but is really not used. Only one test uses it as far as I can see, and it’s the test to verify that feature works. It should just be removed, IMO.

Proposal

I propose that we replace the TableGen-erated Combiner implementation with a new one, based on top of the Match Tables (MT) used by the Global InstructionSelector.

MTs seem to be a good fit for combiners, as ISel can be viewed as a special combiner that matches a set of MIs and rewrites them. They’re very similar in many ways, and they both use top-down pattern matching which is a plus.

As MTs power InstructionSelectors, they can implement a huge variety of (sometimes non-trivial) DAG patterns imported from DAGISel. This means that implementing MIR patterns should be easier, as we already know that the underlying framework can support them.

Another thing to note is that they’re known to scale well too. The generated tables can be huge (e.g. AMDGPUGenGlobalISel.inc is 123k lines of code), and I believe they’re still reasonably fast despite that (?).

Finally, I believe that such a change would empower existing and new contributors, as they would only need to learn one system to contribute to 2 of the biggest parts of GlobalISel. I am hopeful that it would help spark innovation in the GlobalISel infrastructure by lowering the entry barrier. Another benefit is that improvements in ISel are likely to benefit the combiners, and vice-versa.

D141135

I usually do not like to propose a change that makes another in-review patch one obsolete, but unfortunately, my proposal is not compatible with ⚙ D141135 [RFC][GlobalISel] Replace the current GlobalISel matcher with a bottom-up matcher.

I waited for months, hoping that diff would land, but the diff is inactive. I got no response from my ping on Phabricator, and from my email to the author so I assume that the fate of that diff is uncertain (likely abandoned), and I think it’s appropriate to suggest an alternative at this stage.

Prototype Implementation Results

I have a work-in-progress, working implementation for AMDGPU: GitHub - Pierre-vh/llvm-project at matchtable-mirpats

Highlights

InstructionSelector.h/InstructionSelectorImpl.h were generalized to GIMatchTableExecutor.h/GIMatchTableExecutorImpl.h.
A common base class (GlobalISelMatchTableExecutor.h) was created between the Combiner and GlobalISel TableGen emitters, which encapsulates all of the common logic.
All AMDGPU combiners were switched to use match tables, and all of them work without any changes in the test (after some very minor changes in a combine or two).
AMDGPUGenPreLegalizeGICombiner.inc file is about 3000 lines, vs 4600 for the current implementation.

Implementation Choices & Disclaimers

Disabling rules is not a thing.
- As said above, this system looks unused and I struggle to think of any use case for disabling combines through a CL option (that can’t be solved by just adding an ad-hoc CL option). It can always be re-added if needed, but it’ll require some work as there is no existing equivalent in the Match Table.
MIR-to-MIR patterns aren’t implemented yet.
- No combine use them currently, so I think the best approach is to progressively port combines to MIR patterns and implement them over time after this change lands. It’s likely to need a RFC on its own.
Similarly, there are very little functionality changes in the combines.
- This is intentional because it will be a big change as-is, and adding a bunch of MIR pattern stuff into the mix would make review more difficult. I simply aim for the initial patch to have feature-parity with the current implementation.
This is still a work-in-progress!
- It will need a lot of clean ups before being reviewable. Of course, it will also be split into as many patches as possible for review.
I did not gather (compile-time) performance numbers yet…
- … but I expect these match tables to be as fast as the current version (or maybe a bit behind, given that C++ code is called indirectly now). Once we migrate the majority of the patterns to MIR pattern, I would expect this to outperform the current implementation.
- NOTE: Tips on how to gather meaningful performance numbers are welcome.
No handling of commutable operations for now.
- It isn’t needed with current patterns (e.g. in G_ADD $x, $y, $z there is nothing to commute as both in-operands are unconstrained), but will be needed once we get more complex MIR patterns (e.g. allow matching a constant, or constrain a regbank, etc.)
- Worst case, I suppose we can simply “instantiate” patterns as many times as needed to cover all the possible variants. This isn’t ideal but it should work. An alternative could be to augment the match table so it can support re-trying a rule but with swapped operands, but I’m not sure if that’s doable. (How do DAG patterns do it?)

Upstreaming

If this is a welcome proposal, I think it will be reviewable in the next few weeks (more or less, depending on how much time I can allocate to this). However, before I commit to this, there are also a few topics that I need feedback on:

Fundamentally, is this a welcome change?
I would like to delete the current implementation (GIMatchDag) entirely when this is added. Would there be any objections to that?
- The generated code is too different, it’s impossible to have a command-line switch to alternate between the two. I also think that keeping in the old system will just prevent good refactoring opportunities.
- Please note that I intend to split the patches as much as possible. I also want to have a single commit that deletes the old implementation & ports all targets over, to make it easier for downstream users to revert it if they need some time to adjust and/or raise concerns.
I do not think re-adding the system to disable Rule IDs is necessary, given that it’s unused as far as I know. Is that needed?

If you have any other requests or concerns with upstreaming this, please raise them here. I will be more than happy to discuss them!

Pierre-vh · June 20, 2023, 10:39am

Hi again,

I’ve been working a bit more on this and I got a few interesting updates:

I now generate one RuleMatcher per opcode in wip_match_opcode, which is the right thing to do I believe (it’s what the previous implementation did, and I think it’s what ISel does too for PatFrags?)
I managed to get rid of all functional changes in combines. Now it just works out of the box with zero regressions.
I added a few size/performance optimizations that reduced the size of AMDGPUGenPostLegalizeGICombiner.inc by 300 lines, it’s now at 2737 lines.
- Skip emission of llvm_unreachable + return if the code starts with return
- Make MatchInfos unique instead of naively creating a variable for each rule. This greatly reduces the number of MatchInfos variables needed.
When a piece of code is expanded, it’s also uniqued so we don’t emit the same piece of code twice for apply/match code.

I’d love to start preparing patches for upstreaming but this would be a relatively big patch streak so I would like to gather opinions first.

Tagging a few GlobalISel contributors for visibility: @arsenm, @dsanders, @amara

arsenm · June 20, 2023, 11:20pm

This is important for testing and debugging. I’m pretty sure there are more than a few uses of it (I see 24 AArch64 tests using only-enable-rule).

Combiners are supposed to be bottom up, and selection is also. If the gisel selector is currently top down, that’s a bug and doesn’t match the DAG behavior. How is top down a plus? That breaks most patterns you want to match before you see them.

I completely forgot about this patch until now

Ad-hoc CL options are a problem to solve. We have too many unstructured command line flags. Being able to specifically target and test the interactions of combines in isolation in useful. A previously discussed feature would be to have tooling automatically detect which combines are causing infinite loops in the combiner and other such issues. It’s not an issue in globalisel today as there are so few combines, but it’s often time consuming to untangle all the combines in DAGCombiner.

I still don’t really understand wip_match_opcode or what the point of it is. It seems like it’s just a worse version of PatFrags that doesn’t behave like a single instruction matcher.

Pierre-vh · June 21, 2023, 6:35am

Ah, I was only looking for the disable-rule option, it’s probably why I missed it. I will take a look at re-implementing the options, then. I’ll try to do it as cleanly as possible, ideally I want to avoid adding some kind of “CheckRuleID” opcode at the root of the pattern because that may prevent a lot of optimizations in the match table.

EDIT: I reimplemented it by adding a GIM_CheckIsCombineRuleEnabled opcode which is checked after the features. The stuff that parses the rule names and such adds about 1100 lines of code though, so now the TableGen-erated code is the same size as the old impl.

Isn’t the Global InstructionSelector top-down? If I emit the following pattern:

def : Pat<(i32 (sub GPR32:$src1, (add GPR32:$src2, (mul GPR32:$src2, (i32 0))))),
          (InstTwoOperands GPR32:$src2, GPR32:$src1)>;

The match table first checks for G_SUB, then G_ADD and finally G_MUL: def : Pat<(i32 (sub GPR32:$src1, (add GPR32:$src2, (mul GPR32:$src2, (i32 0))) - Pastebin.com

In any case, we currently have so little MIR patterns (if any?) that it doesn’t seem to make a difference.

EDIT: After thinking about it a bit more, are you talking about the order in which instructions are visited? This might explain the confusion; I was talking about top-down in the tree-pattern-matching sense, where a given pattern is matched from root to leaves (as opposed to bottom-up matching where it’s matched leaves to root). Of course the iteration order in the combiner pass is unchanged, that is still bottom-up AFAIK.

wip_match_opcode is just that, check an opcode and nothing else. If I can dedicate serious time to adding MIR patterns later, removing this is one of the first items I’d look at.
The nice thing about these match table is that it’d be easy to add a PatFrag-like system.

@arsenm, if I address the rule disabling & top-down matching concern, do you have any remaining objections?

qcolombet · June 21, 2023, 10:44am

Disclaimer: I haven’t actively contributed to GISel for a while now.

Any step that would unify the combiners infrastructure with the instruction selection infrastructure is a must have IMHO. Therefore I am in favor of the overall approach.

I’ve been advocating for the same framework for both selection and combines since the original proposal came out ([llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules)

Regarding the “top-down pattern matching”, we (at least I) have a mismatch in terminology. For me, bottom-up means we match the roots first then the leaves. The way I think about bottom and up is in terms of the program order: the root happens later (bottom) than the leaves (up).

This is useful like @arsenm said for testing and debugging, but anyway, this is not fundamentally different than a predicate on whether or not a rule applies. I.e., we should be able to piggyback on MTs abilities to skip rules when e.g., a subtarget supports a specific feature. The only difference is how we fetch this predicate (CL, subtarget feature, config file, whatever).

Pierre-vh · June 21, 2023, 1:26pm

Indeed, it’s just a small confusion. I think “top-down” (to describe root-to-leaves) is less confusing here and it’s closer to an academical definition. To me, the program order doesn’t matter in the definition, I just see it as matching a subtree in a bigger tree.

I’ll just use “root to leaves” from now on to avoid confusion

I also have some (good!) progress updates on the implementation:

I re-added all the rule config logic - mostly copy-pasted from the old impl - and it seems to work just fine.
- To support them, I added a special opcode GIM_CheckSimplePredicate which simply calls a function with a unsigned flag to check context-less predicates (doesn’t need a MI and can just be called right next to CheckFeatures). I think it’s a nice addition and it’s not leaking any combiner-specific stuff into the MatchTableExecutor.
I ported the AArch64 pre & post legalizer combiners to the new system and they also work completely fine (minus a small bug in my impl that was promptly fixed)
I generalized GIR_CombineApply to a GIR_CustomAction opcode, which, as the name implies, just runs a custom, generic action upon matching (and doesn’t mess with in/out instructions). The action doesn’t return anything and simply takes the MatcherState as parameter. I also think it’s a useful addition that doesn’t leak combiner-specific things into the executor.

Overall it’s in good shape I would say. It’s a bit messy in places but that’s expected of a WIP. I’ll rewrite/refactor the problematic parts while preparing the patches.

If someone feels like taking a look, here’s how AMDGPUGenPreLegalizerGICombiner.inc looks like now: AMDGPUGenPreLegalizerGICombiner.inc - Pastebin.com

Pierre-vh · June 23, 2023, 10:06am

Is there any objection to me making a patch series to start the review process?

I haven’t started splitting it up yet, but on top of my head I think there would be the following patches:

[NFC] Remove return from combine apply C++ code.
Generalize InstructionSelector into GIMatchTableExecutor
Add GlobalISel Combiner Match Table Emitter
1 to 3 patches to port targets to the new combiner (probably one AMDGPU, one AArch64 then one for all remaining targets)
Delete old combiner

In the end I don’t think there is that much room for refactorings, at least not enough to warrant a single big patch that deletes the old combiner + adds the new one (which would be a pain to review TBH). I think I’ll do the transition like this:

Add the new (MatchTable) backend as a GINewCombinerEmitter.cpp
Port all targets
Add a deprecation warning/error to the old backend.
(After a few weeks) Delete the old implementation and remove the new prefix from the MT-based impl.

(Alternatively, I could name the new backend GICombinerExecutorEmitter but it’s a bit more confusing, IMO.)

I believe Apple is using GISel in a downstream backend right? It’d be nice to have some downstream maintainers of combiners giving their thoughts on this RFC

Pierre-vh · June 26, 2023, 11:58am

I posted a first series of patches.
I’ll write & post the remaining patches (AArch64 port, misc ports, delete old combiner) in the coming days/week.

Pierre-vh · June 27, 2023, 1:02pm

Two more patches are up to port remaining targets. No issues to report!

Now I think a good question to ask is how would we want to retire the old combiner. I think it may be used downstream still. I guess there’s two approaches we could take:

First deprecate it (e.g. always print a warning in the combiner + add a #warning in the .inc) for a while (weeks or months?) then remove it.
Directly remove it, and let downstream adapt. I think there is no need to deprecate for internal APIs. If a downstream user needs more time they can always revert it and reapply the patch later.

amara · July 5, 2023, 8:35pm

Sorry for the late response on this Pierre, I’ve only just had time now to spend time considering this proposal. Overall I have on objections to this, and if this will allow us to write more combines in patterns rather than C++ then it’s a net benefit.

On the deprecation strategy, I believe there are downstream users with custom targets using GlobalISel. If deprecation doesn’t cost us a lot, I think it should do that. That said, there’s no need for a long deprecation period. I think a month would be fine.

Pierre-vh · July 6, 2023, 8:13am

Hi, thanks a lot for your reply! If you have some time for it, please also take a look at the Phabricator reviews I’d like as many people as possible to approve them before landing them.

One thing to note though is that this RFC is just a proposal to change the implementation of the combiners. It won’t add any features, and while some MIR patterns should work better now, there’s no guarantee they’ll all work beyond what’s currently in-use.

I will try my best to get another RFC out for MIR patterns support at some point, but I first want to get an implementation that I’m satisfied with. I’m working on it but progress is a bit slow, there’s a lot of hidden nuances and complexity to it, so I can’t guarantee it’ll make it upstream yet.

IMO, even in the worst case (no MIR patterns after), this RFC is still a net improvement to the infrastructure because it makes it easier to contribute and add new functionality to the combiners

dsanders · July 12, 2023, 12:38am

Fundamentally, is this a welcome change?

Yes, very much so! Thanks for working on the combiner. There were some reasons for the separate implementation but I’ve been promised multiple times there would be room in our schedule to continue the combiner but it never comes to pass and the workload continues to rise. At this point it’s pretty clear that having time for big changes isn’t going to happen in practice.

The one concern I have is whether I’ll still be able to do things like the sextload/zextload combine in this implementation. The background to this is that DAGISel would only consider one use at a time and this necessitated excessive use of hasOneUse() (or the hasOneNonDbgUse() which isn’t sensitive to -g) in our downstream target to prevent combines applied in the context of one use leading to worse code overall across multiple uses. In the case of sextload/zextload this tended to manifest as unnecessary duplication of loads. Along the same lines, the typical fmul/fadd → fma combine is only a win if the fmul goes away as a result. If it doesn’t, it can be cheaper to stick with fmul/fadd and while hasOneUse() wasn’t the right tool to check that, it was the only one available in the bottom-up (in gMIR terms) tree matching. Resolving these problems with MatchTables should be possible but when I last worked on this code there was definitely problems with a tree-like structure (inherited from DAGISel’s tablegen syntax) being embedded in the design of the objects that built the MatchTables. It may be that the answer to that is to evict such combines to another pass but as mentioned above we’d have to evict quite a bit for the hasOneUse() calls we’ve ended up with.

I would like to delete the current implementation (GIMatchDag) entirely when this is added. Would there be any objections to that?

Mid to long term no but short term we could use a bit of time. We’re currently playing catchup on several fronts at the moment and have quite a lot of things to un-revert on our upstream-tracking branch. In particular, opaque pointers and the removal of the legacy pass manager. It would be difficult to take on a third right now. That said, I don’t really want to be in your way. If you can give us a couple months before things start being removed that would help us out a fair bit. If not, that’s fine too, it’s just the cost of being behind .

I do not think re-adding the system to disable Rule IDs is necessary, given that it’s unused as far as I know. Is that needed?

I see there’s already been discussion on this, thanks for re-adding support for disabling combine rules. That’s one thing I really wanted to have support for as I’ve had a lot of trouble tracking down combine-related bugs in DAGISel and the ability to switch off some/all of them and narrow down a bug that way would have been really handy.

From Quentin Colombet:
I’ve been advocating for the same framework for both selection and combines since the original proposal came out ([llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules)

If I’m reading this RFC correctly it’s not really going as far as your original position of re-using all the DAGISel tablegen syntax. The DAGISel nodes really didn’t fit GlobalISel which is why the ISel support still has holes and we end up using C++ (in our downstream target a lot).

Pierre-vh · July 19, 2023, 9:30am

Hi, sorry for the late reply - I somehow didn’t get an email notification and missed this.

I’m not sure I understand, can you write a small example? No upstream combines were affected by this change, so I assume it’s a downstream issue?

In any case, I think the MatchTable is very flexible. It’s simply a framework for a state machine after all, so given a few refactors I’m sure we can make almost anything work. If I understand correctly, you would want a system to make a MIR pattern only match if all the instructions can be folded out? That’s definitely possible I believe.

I don’t mind waiting a bit if needed, I was planning to deprecate the backend this week & remove it sometime by the end of summer (probably late August/early September). I also plan do to a few refactors while I’m at it to make the Combiner pass work more like the ISel pass (= don’t make it re-instantiate the implementation on every instruction). Would that work for you?

If you encounter any major issues while porting this feel free to open an issue on GitHub and tag me, or simply send me an email directly. I’m happy to help!

Note that I’m also working on a MIR pattern system based on this RFC (re-using the same syntax as current MIR patterns, just with a lot more features). My goal is make it possible to rewrite most combines as pure tablegen/MIR patterns. That’s a separate topic though and we can discuss it when/if I publish an RFC for it (it’d be a very big patch so it needs to cook a bit longer)

Topic		Replies	Views
[RFC] Overhauled MIR Patterns for GlobalISel Combiners Code Generation globalisel	1	437	July 26, 2023
[RFC] Tablegen-erated GlobalISel Combine Rules LLVM Dev List Archives	44	335	December 7, 2018
RFC: [GlobalISel] Towards a generic MI combiner framework LLVM Dev List Archives	13	105	November 28, 2017
Cleanly Addressing GlobalISel/DAG Pattern Matching Differences Code Generation	18	1073	November 23, 2022
Work on DAG Isel for TableGen and compiler LLVM Dev List Archives	7	115	November 19, 2020