Using [GlobalISel] to provide peephole optimizations

Hi,

GlobalISel is fantastic, but obviously lacks a lot of the transforms that
makes SelectionDAG so good. Whilst it's plenty usable, you'll find yourself
wanting/needing to add a lot of manual little transforms to clean things up.

I know of the RFC for a new Combiner with its own syntax
(https://reviews.llvm.org/D54286 is the latest I can find of it), but after
manually adding my Nth manually coded pass for a niggling important
transform, and then needing to add more cases to
FoldImmediate/MemoryOperand/OptimizeLoad after it. I wondered how hard it
would be to allow GlobalISel to reselect machine patterns, eg after they've
been made available by other passes.

What I was thinking is in addition to anything else that's coming, allowing
Instructions to exist on the input side of Pat<>, and using the same
InstructionSelector we already have to reselect.

To my surprise... not many changes are required to seemingly make this work:

    // fold loads in to compare instructions
    def : Pat<(CPw_sr i32:$k, (MOVw_wf iPTR:$s)), (CPw_sf i32:$k, iPTR:$s)>;

And it looks like SDNodeXForms will work off the bat, along with complex
renderers. The main catches being with constants that require checking (due
G_CONSTANT being handled differently to immediates), along with needing to
add checks that the instruction has the same implicits, and that they're
dead where appropriate. But it seems viable and is definitely easy to use so
I'm just wondering... is this something that's being considered/is appealing
to people? And/or is the restriction of not allowing Instructions on the LHS
quite an intentional design decision?

Because it seems that this would provide some value even for those not using
GlobalISel as their primary selector, just as a way of quickly describing
peephole optimizations and leveraging the very nifty little VM there to
implement them. In theory, a lot of pattern fragments could even be added
automatically, by comparing pattern fragments and the machine opcodes they
represent - giving a free automatically generated "foldImmediate", among
other things.

A diff of the proof-of-concept can be found here:
(https://paste.ee/p/aDHIg), note though it's really just a curiosity to get
some conversation going, that other Selectors will reject these patterns (I
haven't added their escape routes), and that you're likely to break it in
many different ways. And of course that they should exist as their own
table, and that Reselect ought return the newly selected instruction.

But it's neat enough for me as is to get some useful patterns added already,
with these caveats in mind.

Any thoughts?

Regards,
Alex Davies

PS Apologies if the diff doesn't work first go - had to wrestle with quite a
few out-of-tree things back to get things in order. Note, includes fix for
https://bugs.llvm.org/show_bug.cgi?id=42032 also. Oh, and if there's already
a peephole pattern emitter that I've missed, please do let me know, I'll
easily survive the shame. :slight_smile:

+Daniel who’s prototyping the new combiner framework.

Hi Alex,

Sorry for the slow reply, I was on duty for merging from upstream and fixing issues this week and that took almost all my time.

+Daniel who’s prototyping the new combiner framework.

Hi,

GlobalISel is fantastic, but obviously lacks a lot of the transforms that
makes SelectionDAG so good. Whilst it's plenty usable, you'll find yourself
wanting/needing to add a lot of manual little transforms to clean things up.

I know of the RFC for a new Combiner with its own syntax
(https://reviews.llvm.org/D54286 is the latest I can find of it),

I'm still working on this but there's been higher priority tasks throughout the year that have taken up a fair bit of my time. I'm hoping to have some news on this fairly soon

but after
manually adding my Nth manually coded pass for a niggling important
transform, and then needing to add more cases to
FoldImmediate/MemoryOperand/OptimizeLoad after it. I wondered how hard it
would be to allow GlobalISel to reselect machine patterns, eg after they've
been made available by other passes.

What I was thinking is in addition to anything else that's coming, allowing
Instructions to exist on the input side of Pat<>, and using the same
InstructionSelector we already have to reselect.

To my surprise... not many changes are required to seemingly make this work:

  // fold loads in to compare instructions
  def : Pat<(CPw_sr i32:$k, (MOVw_wf iPTR:$s)), (CPw_sf i32:$k, iPTR:$s)>;

And it looks like SDNodeXForms will work off the bat, along with complex
renderers. The main catches being with constants that require checking (due
G_CONSTANT being handled differently to immediates), along with needing to
add checks that the instruction has the same implicits, and that they're
dead where appropriate. But it seems viable and is definitely easy to use so

It makes sense that a fair bit of this will work. Unlike DAGISel, the input and output are based around the same representation. Aside from the ones you mentioned, the main limitations you'll run into will stem from limitations on the patterns syntax (e.g. instructions are only allowed one result) and semantics (e.g. references to register classes like GPR32:$src specify a type constraint but not a register class constraint).

I'm just wondering... is this something that's being considered/is appealing
to people?

Something along these lines should be doable with the Combiner work when it lands and I can see some benefits to a post-isel combiner. The same benefits would also apply to the peephole mechanism you're proposing.

And/or is the restriction of not allowing Instructions on the LHS
quite an intentional design decision?

This restriction comes primarily from DAGISel where the LHS is matching SDNodes in the DAG whereas the RHS is emitting Machine SDNodes (albeit briefly) and then converting them to MachineInstrs. The importer that GlobalISel uses to import SelectionDAG rules into GlobalISel's instruction selector carried this restriction forwards. W.r.t the instruction selector (and ignoring the issues I mentioned above), it makes sense in principle for instructions to be allowed except for at the root of the pattern because GlobalISel allows passes before instruction selection to select instructions. The only reason it doesn't make sense at the root of the pattern is that the instruction selector skips instructions that are already target machine instructions or one of the permitted target independent machine instructions.

The big issue would be: When are these new rules executed?
The instruction selector is intentionally less general than a combiner since combiners are more expensive (re-applying rules until convergence is difficult and expensive) and the instruction selector is a mandatory pass that runs at all optimization levels. I wouldn't be in favour of making the instruction selector more like a combiner by re-processing instructions but I think that this peephole idea would make sense as a separate pass like a second (but optional) instruction selection pass, or an optional post-isel combiner. I think you may be proposing adding it to PeepholeOptimizer.cpp but I didn't see anything calling reselect() in your patch.

Because it seems that this would provide some value even for those not using
GlobalISel as their primary selector, just as a way of quickly describing
peephole optimizations and leveraging the very nifty little VM there to
implement them. In theory, a lot of pattern fragments could even be added
automatically, by comparing pattern fragments and the machine opcodes they
represent - giving a free automatically generated "foldImmediate", among
other things.

I'm not sure I understood this bit. Could you give an example?