[RFC] Introducing classes for the codegen driven by new pass manager

(NPM: new pass manager; LPM: legacy pass manager)

Hello, community

While we're still working towards using NPM for optimizer pipeline by default, we still don't have a machine pass interface and the corresponding machine pass manager using NPM. The potential benefits using NPM aside, this inhibits us from making any progress on deprecating LPM for the codegen pipeline which blocks removing LPM altogether. The purpose of this series of patches is to (1) let pass developers write or port machine passes to a new machine pass interface and be able to test it with `llc`. (2) let a target have the choice of implementing the codegen pipeline using NPM (Work-in-Progress). Maybe it is obvious already, but I also want to mention that these patches do not intend to force a target to migrate to NPM right way.

It would be also great to know how many people want this to happen and any comments/suggestions are greatly appreciated. Thanks a lot in advance!

** Goal **
(1) a machine pass interface for NPM that existing machine passes could be ported to.
(2) a machine pass manager
(3) a codegen pipeline pass builder with NPM
(4) llc-driven codegen pipeline with NPM (`llc -enable-new-pm`). Existing command-line flags supported by `llc` should continue to work when `-enable-new-pm` is specified.

** Non-Goal (community effort) **
- port existing codegen passes to NPM. This needs community effort and domain expertise to do correctly and efficiently. This includes target-independent codegen IR/machine passes and target-specific codegen IR/machine passes.
- 100% codegen feature parity between LPM and NPM. It's very likely that I did not port some features in this work. A few I'm aware of are `MachineFunctionProperties`, `-debugify-and-strip-all`, and "machine pass plugins". However, if the community is convinced this work is the right design, these missing features could be added by interested parties.

** Implementation / Design Choices **

* Goal-1 *
https://reviews.llvm.org/D67687

Four member methods of a machine pass are recognized by the machine pass manager:
(1) `PreservedAnalyses run(MachineFunction &, MachineFunctionAnalysisManager &)`. Majority of the machine passes use this.
(2) `Error doInitialization(Module &, MachineFunctionAnalysisManager &)`. Passes like AsmPrinter needs a hook to lower/transform global constructs. (invoked before all passes `run` method)
(3) `Error doFinalization(Module &, MachineFunctionAnalysisManager &)`. Client: PrintMIRPass. This is also for completeness. (invoked after all passes `run` method)
(4) `Error run(Module &, MachineFunctionAnalysisManager &)`. Client: MachineOutliner, DebugifyMachineModule. I would call this machine module pass which needs a global scope. It is like (1) but subject to pass ordering. Currently, a pass either define this or (1), but not both.

(doInitialization/doFinalization is currently not supported by the NPM optimizer pipeline because there is no real need for it.
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126504.html)

* Goal-2 *
https://reviews.llvm.org/D67687

Unlike IR where `has-a` relationship exists among module/function/loop/cgscc etc., the MIR does not have `has-a` relationship with any kind of IR. It does have a one-on-one relationship with IR function. So, transforming MIR does not change any IR unit at all. Invalidating a MIR analysis result also does not invalidate any IR analysis result.

Based on the above observation, the machine pass manager runs standalone, i.e. combining it with other types of pass managers using adaptor passes are not supported. There is also no proxy defined for machine analysis manager with any other types of analysis managers. The machine analysis manager does provide API to register and query IR analysis because machine passes need IR analysis result in general.

* Goal-3 *
https://reviews.llvm.org/D83608
https://reviews.llvm.org/D83613 (X86 part)

The codegen pipeline is different from optimizer pipeline in that it is customizable by a target in the granularity of (1) portions of a pipeline, (2) insert/replace/disable individual pass. Currently, the main class for this job is `class TargetPassConfig`.

`class TargetPassConfig` could be broken down into three parts:
(1) a polymorphic class for a target to customize codegen pipeline
(2) a bunch of member functions and variables to support debugging features (MIR print/verify, llc -start-before/-stop-before/-start-after/-stop-after, etc.)
(3) a legacy immutable pass to declare hooks for targets to customized some target-independent codegen layer behavior.

A new mixin called `class CodeGenPassBuilder` is introduced to replace `class TargetPassConfig`(1), the naming is intended to be similar to its IR counterpart: `class PassBuilder`. `class CodeGenPassBuilder` is CRTP mixin instead of polymorphic class because it brings less layering, flexibility for targets to customized, inlining opportunity and fits the overall NPM value semantics design. There are potential compile-time issues, but I don't think it is a major concern at this moment. The design is not intrusive to make a change inhibitive.

`class TargetPassConfig`(2) is implemented using PassInstrumentation as much as possible because PassInstrumentation is commonly used for debugging features already. This makes `class CodeGenPassBuilder` only do what it is supposed to do. PassInstrumentation is also more flexible than inserting debugging passes into the pipeline.

`class TargetPassConfig`(3) is partially ported to TargetMachine::options. The rest, such as `createMachineScheduler/createPostMachineScheduler`, are left out for now. They should be implemented in LLVMTargetMachine in the future.

Since the machine pass manager runs standalone and codegen process includes both IR pipeline(codegen prepare, EH, etc.) and MIR pipeline, the new codegen pipeline uses a pair of module pass manager and machine pass manager.

Like `PassRegistry.def`, there is a `MachinePassRegistry.def` which includes target-independent codegen IR/machine passes. Passes that not yet ported not NPM have a dummy no-op implementation (to help testing). Target would have their own pass registry file. For X86, it is `X86PassRegistry.def`.

* Goal-4 *
https://reviews.llvm.org/D83610 (TargetMachine API)
https://reviews.llvm.org/D83612 (llc)

`llc` option compatibility is very important for LIT tests migration. So `llc -enable-new-pm` accepts almost all options `llc` accept.

** Testing **
- Since the `llc` options are compatible, as passes are ported to NPM and various issues got resolved, we should see more tests passing when `llc -enable-new-pm` is turned on implicitly via an (maybe) knob similar to `cmake -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER`.
- A buildbot to make sure no regression when `llc -enable-new-pm` is implicitly on?
- Any idea on this regard is much appreciated.

** Work in progress **
Have a minimum codegen pipeline to produce a valid assembly file by porting target-independent passes such as AsmPrinter.cpp, SelectionDAGISel.cpp etc. Currently, target-independent codegen layer (SelectionDAGISel, AsmPrinter etc..) are coupled with LPM tightly.

** Miscellaneous **
- There is a need to force codegen to run according to the callgraph. Currently, this is achieved by adding a dummy CGSCC pass as the container pass of codegen passes. In NPM, this could be implemented by pre-sorting IR functions by callgraph order in the machine pass manager.
- MachinePassRegistry/RegisterRegAlloc/RegisterScheduler. These are for dynamic registering machine passes and also used for in-tree machine passes. Personlly I don't think this is needed and similar opinions were expressed before (https://reviews.llvm.org/D54203#1291969). We should probably just register them as normal passes.
- Timing machine passes. (StandardInstrumentation handle this already)

While we’re still working towards using NPM for optimizer pipeline by default, we still don’t have a machine pass interface and the corresponding machine pass manager using NPM. The potential benefits using NPM aside, this inhibits us from making any progress on deprecating LPM for the codegen pipeline which blocks removing LPM altogether. The purpose of this series of patches is to (1) let pass developers write or port machine passes to a new machine pass interface and be able to test it with llc. (2) let a target have the choice of implementing the codegen pipeline using NPM (Work-in-Progress). Maybe it is obvious already, but I also want to mention that these patches do not intend to force a target to migrate to NPM right way.

Awesome!

It would be awesome to delete all the LPM infra at some point in the future. But even just deleting all the optimizer pipeline LPM infra would be a big win, and that shouldn’t be tied to codegen.

Four member methods of a machine pass are recognized by the machine pass manager:
(1) PreservedAnalyses run(MachineFunction &, MachineFunctionAnalysisManager &). Majority of the machine passes use this.
(2) Error doInitialization(Module &, MachineFunctionAnalysisManager &). Passes like AsmPrinter needs a hook to lower/transform global constructs. (invoked before all passes run method)
(3) Error doFinalization(Module &, MachineFunctionAnalysisManager &). Client: PrintMIRPass. This is also for completeness. (invoked after all passes run method)
(4) Error run(Module &, MachineFunctionAnalysisManager &). Client: MachineOutliner, DebugifyMachineModule. I would call this machine module pass which needs a global scope. It is like (1) but subject to pass ordering. Currently, a pass either define this or (1), but not both.

(doInitialization/doFinalization is currently not supported by the NPM optimizer pipeline because there is no real need for it.
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126504.html)

Are doInitialization/doFinalization really necessary? As mentioned in the previous discussion, it seems like usually the things in doInitialization/doFinalization are not logically in the right place.
For example, maybe PrintMIRPass should just be a module pass, like PrintModulePass.

Unlike IR where has-a relationship exists among module/function/loop/cgscc etc., the MIR does not have has-a relationship with any kind of IR. It does have a one-on-one relationship with IR function. So, transforming MIR does not change any IR unit at all. Invalidating a MIR analysis result also does not invalidate any IR analysis result.

Based on the above observation, the machine pass manager runs standalone, i.e. combining it with other types of pass managers using adaptor passes are not supported. There is also no proxy defined for machine analysis manager with any other types of analysis managers. The machine analysis manager does provide API to register and query IR analysis because machine passes need IR analysis result in general.

Maybe this is my lack of familiarity with codegen, but why doesn’t a Module contain all its MachineFunctions? It seems like that adaptor would be necessary.

** Testing **

  • Since the llc options are compatible, as passes are ported to NPM and various issues got resolved, we should see more tests passing when llc -enable-new-pm is turned on implicitly via an (maybe) knob similar to cmake -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER.
  • A buildbot to make sure no regression when llc -enable-new-pm is implicitly on?
  • Any idea on this regard is much appreciated.

Manually running tests once in a while might be good enough, not sure if the cost of setting up a bot that maintains some sort of list of tests that have passed in the past is worth it. From my limited experience, tests won’t really tend to regress under NPM as long as you have some tests explicitly testing NPM sprinkled around.

-Yuanfang

From: Arthur Eubanks <aeubanks@google.com>
Sent: Monday, July 13, 2020 12:49 PM
To: Chen, Yuanfang <Yuanfang.Chen@sony.com>
Cc: LLVM Developers' List <llvm-dev@lists.llvm.org>
Subject: Re: [llvm-dev] [RFC] Introducing classes for the codegen driven by
new pass manager

  While we're still working towards using NPM for optimizer pipeline by
default, we still don't have a machine pass interface and the corresponding
machine pass manager using NPM. The potential benefits using NPM aside,
this inhibits us from making any progress on deprecating LPM for the
codegen pipeline which blocks removing LPM altogether. The purpose of this
series of patches is to (1) let pass developers write or port machine passes to
a new machine pass interface and be able to test it with `llc`. (2) let a target
have the choice of implementing the codegen pipeline using NPM (Work-in-
Progress). Maybe it is obvious already, but I also want to mention that these
patches do not intend to force a target to migrate to NPM right way.

Awesome!

It would be awesome to delete all the LPM infra at some point in the future.
But even just deleting all the optimizer pipeline LPM infra would be a big win,
and that shouldn't be tied to codegen.

True. Opt and codegen pipeline could make independent progress of NPM migration.

  * Goal-1 *
  https://reviews.llvm.org/D67687

  Four member methods of a machine pass are recognized by the
machine pass manager:
  (1) `PreservedAnalyses run(MachineFunction &,
MachineFunctionAnalysisManager &)`. Majority of the machine passes use
this.
  (2) `Error doInitialization(Module &,
MachineFunctionAnalysisManager &)`. Passes like AsmPrinter needs a hook
to lower/transform global constructs. (invoked before all passes `run`
method)
  (3) `Error doFinalization(Module &,
MachineFunctionAnalysisManager &)`. Client: PrintMIRPass. This is also for
completeness. (invoked after all passes `run` method)
  (4) `Error run(Module &, MachineFunctionAnalysisManager &)`.
Client: MachineOutliner, DebugifyMachineModule. I would call this machine
module pass which needs a global scope. It is like (1) but subject to pass
ordering. Currently, a pass either define this or (1), but not both.

  (doInitialization/doFinalization is currently not supported by the NPM
optimizer pipeline because there is no real need for it.
  http://lists.llvm.org/pipermail/llvm-dev/2018-
September/126504.html)

Are doInitialization/doFinalization really necessary? As mentioned in the
previous discussion, it seems like usually the things in
doInitialization/doFinalization are not logically in the right place.
For example, maybe PrintMIRPass should just be a module pass, like
PrintModulePass.

Good point! It is very likely that PrintMIRPass could be implemented with (4) above.
For AsmPrinter's uses of ` doInitialization`, I'm not 100% sure. It looks like it could also be replaced by (4).
But difference is ` doInitialization` is invoked before all passes `run` method, whereas (4) is invoked by pass order.
I don't know if there are any implicit/subtle dependency in this regard.

It would be great to have some codegen folks to shed light on it here.

  * Goal-2 *
  https://reviews.llvm.org/D67687

  Unlike IR where `has-a` relationship exists among
module/function/loop/cgscc etc., the MIR does not have `has-a` relationship
with any kind of IR. It does have a one-on-one relationship with IR function.
So, transforming MIR does not change any IR unit at all. Invalidating a MIR
analysis result also does not invalidate any IR analysis result.

  Based on the above observation, the machine pass manager runs
standalone, i.e. combining it with other types of pass managers using adaptor
passes are not supported. There is also no proxy defined for machine
analysis manager with any other types of analysis managers. The machine
analysis manager does provide API to register and query IR analysis because
machine passes need IR analysis result in general.

Maybe this is my lack of familiarity with codegen, but why doesn't a Module
contain all its MachineFunctions? It seems like that adaptor would be
necessary.

Because their data structures are separated. Logically they are representations of the same/similar thing
at different abstraction level.

  ** Testing **
  - Since the `llc` options are compatible, as passes are ported to NPM
and various issues got resolved, we should see more tests passing when `llc -
enable-new-pm` is turned on implicitly via an (maybe) knob similar to `cmake
-DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER`.
  - A buildbot to make sure no regression when `llc -enable-new-pm` is
implicitly on?
  - Any idea on this regard is much appreciated.

Manually running tests once in a while might be good enough, not sure if the
cost of setting up a bot that maintains some sort of list of tests that have
passed in the past is worth it. From my limited experience, tests won't really
tend to regress under NPM as long as you have some tests explicitly testing
NPM sprinkled around.

Sounds good. I was afraid if it takes too long to migration codegen passes, there may be regressions introduced
into IR-to-obj tests under NPM (they exercise mush more than a single or a few consecutive passes). I think it's fine to consider this in the future.

I looked more closely at some of the remaining check-llvm failures under NPM, and quite a few are due to passes that haven’t been ported to NPM yet. The ones I looked at all share the trait of needing some analysis to provide a TargetMachine, which doesn’t exist in NPM yet. So actually some of your work is required for the optimizer pipeline NPM switch.

Let’s get a burn down list and some directions, I’ll try to help :slight_smile:

-eric

Sorry, I should have included the bug: https://bugs.llvm.org/show_bug.cgi?id=46651. Just today I added the directories containing the failures, sorted by count. Repro instructions are in the first comment.

I’d just note that not every pass you can run with “opt” is actually part of the optimization pipeline. There are a few important IR-level passes that only run in the codegen pipeline, but are still nameable with opt to run individually for testing purposes. Switching over doesn’t need to block on these passes being migrated. So I’m not sure this method of determining progress towards switching to NPM actually makes sense.

E.g., looking at the first few entries on your list, amdgpu-lower-intrinsics and codegenprepare are codegen only.

However, objc-arc is not – that does really need to be ported.

Then there’s things like loop-unswitch – the functionality is ported to NPM, but as a different ‘simple-loop-unswitch’ pass. Probably there’s no reason to migrate loop-unswitch itself.

Somewhat along those lines, can you explain a bit how MachineFunction
passes and other passes can live within the same pass manager? The
split between "optimization" and "code generation" is a bit arbitrary,
and there are benefits to be had from running everything in the same
pass manager, because analyses can be preserved. With GlobalISel, it
may even be possible to carry analyses like dominator tree and loop
info from the IR world over to the MIR world.

Cheers,
Nicolai

I'd just note that not every pass you can run with "opt" is actually part of the optimization pipeline. There are a few important IR-level passes that only run in the codegen pipeline, but are still nameable with opt to run individually for testing purposes. Switching over doesn't need to block on these passes being migrated. So I'm not sure this method of determining progress towards switching to NPM actually makes sense.

FTR, there are IR-level passes that only run in the codegen pipeline,
and are *not* available in opt, but are available in llc; for example
the stack protector pass. Just throwing that out there.
--paulr

I would consider these to be bugs. These passes tend to be poorly tested and don’t have nice single pass tests

-Matt

Not sure about the Legacy PM. For NPM, the IR analysis could be carried over for codegen use.

In codegen with NPM, I've made all codegen passes (IR or MIR pass) to be only driven by `llc`. Both due to the way NPM registering pass (on-demand&dynamic instead of static initialization in Legacy PM), and reduce the confusion about which tool (`llc` or `opt`) to test codegen IR passes.

I think there’s no real distinction between “codegen” IR passes and noncodegen IR passes. I routinely run “codegen only” passes with opt in conjunction with other passes when experimenting. I think losing the ability to run any IR pass with opt would be a functionality regression.

-Matt

Indeed, but there is a distinction about their position in the pipeline. We run opt & codegen pipeline separately, “codegen” IR passes run with other “codegen” IR passes. The same goes for regular IR passes.

Do you run “codegen” IR passes with regular IR passes? If so, do you mind sharing the use cases? I might have missed this use case.

I suppose there are 3 cases:

1. IR passes that are run in the codegen pipeline, but also run in the regular optimization pipeline. For example, AMDGPU runs SROA and a few other “noncodegen” IR passes as part of codegen. These obviously are both.
2. Reducing testcases. I extract the IR at some point in the codegen pipeline, and want to see what happens if I ran other passes at that point.
3. Experiments to see if there is value in moving a codegen IR pass earlier in the pipeline, into the non-codegen IR passes

-Matt

Thanks Matt. That's really helpful. Seems like I need to rethink the pass registration a little bit.

Indeed, but there is a distinction about their position in the pipeline. We run opt & codegen pipeline separately,

Why, though? Is there a reason why this inherently makes sense, or is
it just a historical accident? At least to me it seems that it would
make more sense to run all passes within a single pipeline.

Cheers,
Nicolai

TBH, I don't know the history of this choice. Although it makes sense from the debugging/testing point of view. We have IR / MIR difference so it makes sense to have their own pipeline (yeah, MIR pipeline is prefixed with an IR pipeline to prepare the module, but that is just implementation detail). And then we have opt/llc to test the pipeline separately. Combining them in a single pipeline is possible, but AFAIK it does not have many benefits in practice.

One thing I want to mention. I believe in the current legacy pass manager implementation only one MachineFunction ever exists at a time. It is deleted before the next MachineFunction is created. This is very important for memory usage. I think the MachineOutliner being in the pipeline may create an exception to this. I think the initial version of retpoline used a ModulePass and that had to be changed to avoid excessive memory usage.

An addition to this is analyses between different MachineFunctions still matter. AMDGPU forces SCC order and has an analysis to track function information after it’s been codegened. This is needed to track the cumulative register usage of kernels

-Matt