Introduction
Much like InstCombine for IR, and DAGCombine for DAGISel, the GlobalISel Combiners are one of the most important parts of GlobalISel and are the source of many important generic and target-specific optimizations.
However, the current system on which combiner rules are matched & applied leaves room for improvement. Moreover, I feel like the implementation is getting stale; contributions are irregular (most commits are maintenance commits) and there doesn’t seem to be any effort to improve the QoL of combine rule writers lately.
Lastly, the implementation in its current state has several weaknesses that are difficult, if not impossible to completely address:
- It does not support any kind of non-trivial MIR pattern matching, and certainly doesn’t support MIR-to-MIR patterns.
- This means we implement most of the matching & rewriting in C++ currently, which is (IMO) verbose and hacky.
- AFAIK, supporting MIR-to-MIR patterns with the current combiner implementation would be a very big effort. I’m not sure how it would work with the MatchDAG, and generating C++ code for everything would be (IMO) painful.
- There is very little innovation in the Combiner’s TableGen (backend), and I believe it’s slowing GISel infrastructure progress.
- Combiners (and ISel in general) would greatly benefit from better MIR pattern matching, and it was pitched a long time ago, but never upstreamed/implemented as far as I know.
- The implementation is very ad-hoc and shares little to no code with other parts of the GISel infrastructure.
- That is not really a weakness in itself depending on your opinion. IMHO, I consider it a weakness because it makes maintenance more difficult. I believe it’s one of the reasons why progress in this area is slow: if a new contributor wants to improve the system, they need to learn a lot of boilerplate before being able to make any change. e.g. I recently tried to add intrinsics matching (⚙ D150666 [WIP][GlobalISel] Combiner Intrinsic Matching) and got stuck on some odd “unreachable leaf” error, and it’s very difficult to figure out how to fix it.
- The generated code is sub optimal.
- Now this is very much a nitpick (TableGen-erated code is often spaghetti no matter what), but the code emitted by the backend is convoluted and could use some clean-ups/optimizations.
- There is also a feature to “disable rules” which seems interesting, but is really not used. Only one test uses it as far as I can see, and it’s the test to verify that feature works. It should just be removed, IMO.
Proposal
I propose that we replace the TableGen-erated Combiner implementation with a new one, based on top of the Match Tables (MT) used by the Global InstructionSelector.
MTs seem to be a good fit for combiners, as ISel can be viewed as a special combiner that matches a set of MIs and rewrites them. They’re very similar in many ways, and they both use top-down pattern matching which is a plus.
As MTs power InstructionSelectors, they can implement a huge variety of (sometimes non-trivial) DAG patterns imported from DAGISel. This means that implementing MIR patterns should be easier, as we already know that the underlying framework can support them.
Another thing to note is that they’re known to scale well too. The generated tables can be huge (e.g. AMDGPUGenGlobalISel.inc is 123k lines of code), and I believe they’re still reasonably fast despite that (?).
Finally, I believe that such a change would empower existing and new contributors, as they would only need to learn one system to contribute to 2 of the biggest parts of GlobalISel. I am hopeful that it would help spark innovation in the GlobalISel infrastructure by lowering the entry barrier. Another benefit is that improvements in ISel are likely to benefit the combiners, and vice-versa.
D141135
I usually do not like to propose a change that makes another in-review patch one obsolete, but unfortunately, my proposal is not compatible with ⚙ D141135 [RFC][GlobalISel] Replace the current GlobalISel matcher with a bottom-up matcher.
I waited for months, hoping that diff would land, but the diff is inactive. I got no response from my ping on Phabricator, and from my email to the author so I assume that the fate of that diff is uncertain (likely abandoned), and I think it’s appropriate to suggest an alternative at this stage.
Prototype Implementation Results
I have a work-in-progress, working implementation for AMDGPU: GitHub - Pierre-vh/llvm-project at matchtable-mirpats
Highlights
-
InstructionSelector.h
/InstructionSelectorImpl.h
were generalized toGIMatchTableExecutor.h
/GIMatchTableExecutorImpl.h
. - A common base class (
GlobalISelMatchTableExecutor.h
) was created between the Combiner and GlobalISel TableGen emitters, which encapsulates all of the common logic. - All AMDGPU combiners were switched to use match tables, and all of them work without any changes in the test (after some very minor changes in a combine or two).
-
AMDGPUGenPreLegalizeGICombiner.inc
file is about 3000 lines, vs 4600 for the current implementation.
Implementation Choices & Disclaimers
- Disabling rules is not a thing.
- As said above, this system looks unused and I struggle to think of any use case for disabling combines through a CL option (that can’t be solved by just adding an ad-hoc CL option). It can always be re-added if needed, but it’ll require some work as there is no existing equivalent in the Match Table.
- MIR-to-MIR patterns aren’t implemented yet.
- No combine use them currently, so I think the best approach is to progressively port combines to MIR patterns and implement them over time after this change lands. It’s likely to need a RFC on its own.
- Similarly, there are very little functionality changes in the combines.
- This is intentional because it will be a big change as-is, and adding a bunch of MIR pattern stuff into the mix would make review more difficult. I simply aim for the initial patch to have feature-parity with the current implementation.
- This is still a work-in-progress!
- It will need a lot of clean ups before being reviewable. Of course, it will also be split into as many patches as possible for review.
- I did not gather (compile-time) performance numbers yet…
- … but I expect these match tables to be as fast as the current version (or maybe a bit behind, given that C++ code is called indirectly now). Once we migrate the majority of the patterns to MIR pattern, I would expect this to outperform the current implementation.
- NOTE: Tips on how to gather meaningful performance numbers are welcome.
- No handling of commutable operations for now.
- It isn’t needed with current patterns (e.g. in
G_ADD $x, $y, $z
there is nothing to commute as both in-operands are unconstrained), but will be needed once we get more complex MIR patterns (e.g. allow matching a constant, or constrain a regbank, etc.) - Worst case, I suppose we can simply “instantiate” patterns as many times as needed to cover all the possible variants. This isn’t ideal but it should work. An alternative could be to augment the match table so it can support re-trying a rule but with swapped operands, but I’m not sure if that’s doable. (How do DAG patterns do it?)
- It isn’t needed with current patterns (e.g. in
Upstreaming
If this is a welcome proposal, I think it will be reviewable in the next few weeks (more or less, depending on how much time I can allocate to this). However, before I commit to this, there are also a few topics that I need feedback on:
- Fundamentally, is this a welcome change?
- I would like to delete the current implementation (GIMatchDag) entirely when this is added. Would there be any objections to that?
- The generated code is too different, it’s impossible to have a command-line switch to alternate between the two. I also think that keeping in the old system will just prevent good refactoring opportunities.
- Please note that I intend to split the patches as much as possible. I also want to have a single commit that deletes the old implementation & ports all targets over, to make it easier for downstream users to revert it if they need some time to adjust and/or raise concerns.
- I do not think re-adding the system to disable Rule IDs is necessary, given that it’s unused as far as I know. Is that needed?
If you have any other requests or concerns with upstreaming this, please raise them here. I will be more than happy to discuss them!