[GlobalISel] Prioritizing long patterns in combiner over short ones

Hi,

I'm currently writing target specific combiners with GlobalISel. I have a case where a sub-node of a larger pattern also matches another, smaller combiner pattern. Because the combiner runs top-down, the smaller pattern is matched before the larger pattern has a chance to be matched. Do I have to teach my larger pattern to handle this case or is there a better way to do this?

More importantly, are there any plans to improve this behavior?

Cheers,

Dominik

GICombineGroup may be of use:

// A group of combine rules that can be added to a GICombiner or another group.
class GICombineGroup<list rules> : GICombine {
// The rules contained in this group. The rules in a group are flattened into
// a single list and sorted into whatever order is most efficient. However,
// they will never be re-ordered such that behaviour differs from the
// specified order. It is therefore possible to use the order of rules in this
// list to describe priorities.
let Rules = rules;
}

Hi Jason,

thanks for the reply. That’s what we’re already using but it’s not useful in this case. It seems that the order only actually matters when the roots are the same. In my case, the roots are different. In one case I’m trying to match a G_AND and in the other one I’m trying to match G_OR. The G_AND pattern also shows up in the larger G_OR pattern. Since the combiner walks top-down, it sees the G_AND before the G_OR and matches the smaller pattern. When it then reaches the G_OR later on, the larger pattern doesn’t match anymore because the G_AND has been combined away. If the combiner would walk bottom-up, it would see the G_OR before the G_AND and my use-case would work as expected.

So this is not a problem of priority, but rather a problem of the traversal that the combiner does. I do understand though that simply switching the traversal to bottom-up won’t work in every case and might even cause the same problem.

Cheers,

Dominik

I am really surprised that "the combiner runs top-down". Surely it's
better to combine smaller sub-expressions before combining larger
expressions? Or is there actually a good reason to keep the top-down
order??

DAGCombiner also runs mostly top-down (but it can also add nodes to
the worklist in a bottom-up direction for certain cases). The top-down
order causes problems when you have deep expressions trees that could
be simplified away to nothing. Outer combines see these subexpression
unsimplified. They can use recursive helper functions like
computeKnownBits to try to analyse the unsimplified expression, but
the recursive helper functions typically have a max recursion depth of
6. which is easy to hit.

DAGCombiner is so embedded now that it is hard to change the basic
top-down order, but I would hope we could change the GlobalISel
combiner.

Jay.

+Daniel.

Daniel, do you remember if there was a strong reason for choosing to go top-down on the combiner?

Amara

To be honest, hearing it was top-down surprised me too at first but I think I was remembering the algorithm I wanted to experiment with which required bottom-up.

I asked Aditya about this and apparently it's to address the pathological case where we repeatedly do an entire pass only to simplify a single node of the MIR and slowly propagate that through the MIR. Another reason was because InstCombine is like this too.

I think we can probably address a fair portion of the pathological case by controlling how we populate the work list. Instead of adding all MI's to it each time, we could filter out blocks that can't possibly match on this round. I'm thinking that a block only needs revisiting on the next pass if it contains a use of a register that was def'd by an instruction we changed on the current pass. Therefore the observer could track the vreg defs affected by a pass and build a list of MBB's to revisit. I think we'd have to revisit all the successors of those blocks too if the pass is running top-down (in case we make changes that allow further changes in the same pass) but if we changed it to bottom-up we wouldn't need that. This wouldn't help with large blocks but maybe we can find a similar scheme there.

I think the terminology is confusing. By "bottom up" I mean visiting
the leaves of an expression tree first like in BURS
(BURS - Wikipedia). This is "bottom up" if you draw
your trees with the root at the top. For the expression (fadd (fmul a,
b), c) it would visit the mul before the add. I think simplifying
things in this order is sane because when you visit a node, you can
assume that the operands of that node have already been simplified.

In my defence I think I got this definition from earlier discussions
about the order in which DAGCombiner runs:
https://reviews.llvm.org/D33587#1372912

Unfortunately if you think of textual IR in a basic block, then
"bottom up" order starts at the top of the block and proceeds towards
the bottom :frowning:
  %mul = fmul float %a, %b
  %add = fadd float %mul, %c

A quick experiment shows that InstCombine is bottom-up:
$ cat z.ll
define float @main(float %a, float %b, float %c, float %d, float %e) {
  %add = fadd float %a, %b
  %sub = fsub float %add, %c
  %mul = fmul float %sub, %d
  %div = fdiv float %mul, %e
  ret float %div
}
$ opt -instcombine -debug-only=instcombine z.ll
IC: Visiting: %add = fadd float %a, %b
IC: Visiting: %sub = fsub float %add, %c
IC: Visiting: %mul = fmul float %sub, %d
IC: Visiting: %div = fdiv float %mul, %e

DAGCombiner is top-down, which I think is wrong but it's hard to fix:
$ llc -march=aarch64 -debug-only=dagcombine z.ll
Combining: t14: f32 = fdiv t13, t10
Combining: t13: f32 = fmul t12, t8
Combining: t12: f32 = fsub t11, t6
Combining: t11: f32 = fadd t2, t4

I'm happy to see that GISel Combiner is bottom-up after all:
$ llc -march=aarch64 -global-isel -debug-only=gi-combiner z.ll
Try combining %5:_(s32) = G_FADD %0:_, %1:_
Try combining %6:_(s32) = G_FSUB %5:_, %2:_
Try combining %7:_(s32) = G_FMUL %6:_, %3:_
Try combining %8:_(s32) = G_FDIV %7:_, %4:_

Sorry if I have derailed your original questions Dominik. I think the
answers are "yes you have to teach your larger pattern to handle this
case" and "no there are no plans to improve this behaviour as far as I
know, and I'm not even sure how it would be improved".

Jay.

Jay.

Hi Jay,

thanks for this clarification and sorry for the confusion I seem to have caused! I was solely thinking in terms of IR instead of the pattern. Also thanks for linking that DAGCombiner discussion as well. I'll update my pattern to handle my use-case. In the future I might even add a 2nd post-combiner-combiner to our backend, should it turn out to be beneficial.

Cheers,

Dominik