AMDGPU and support for the new pass manager

Currently AMDGPU is the last target which injects passes into the pipeline but hasn’t been updated to work with the new pass manager.

Recently there was added support for the NPM’s equivalent of TargetMachine::adjustPassManager(). (Bug, Phab). BPF and Hexagon have their NPM equivalents implemented already, but AMDGPU has a lot of custom passes added by adjustPassManager and hasn’t been updated to work with the NPM.

Could the maintainers of AMDGPU port the necessary passes to the new pass manager add them to AMDGPUTargetMachine::registerPassBuilderCallbacks()? I’m happy to provide any guidance if necessary. Here’s an example of porting a pass and adding it to the pipeline.

Some indicative failing tests are under the llvm/test/CodeGen/AMDGPU directory when opt’s -enable-new-pm flag is set to true by default. They should also be repro’able by adding a corresponding NPM RUN line, e.g. “opt -S -O1 -mtriple=amdgcn-- …” → “opt -S -passes=‘default’ -mtriple=amdgcn-- …”.

I see the following AMDGPU-specific failures:
LLVM :: CodeGen/AMDGPU/amdgpu-inline.ll
LLVM :: CodeGen/AMDGPU/infer-addrpace-pipeline.ll
LLVM :: CodeGen/AMDGPU/internalize.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.wavefrontsize.ll
LLVM :: CodeGen/AMDGPU/opt-pipeline.ll
LLVM :: CodeGen/AMDGPU/propagate-attributes-clone.ll
LLVM :: CodeGen/AMDGPU/propagate-attributes-single-set.ll
LLVM :: CodeGen/AMDGPU/simplify-libcalls.ll
LLVM :: CodeGen/AMDGPU/sroa-before-unroll.ll

LLVM :: Transforms/LoopUnswitch/AMDGPU/divergent-unswitch.ll

(Also I took a closer look, actually it looks like NVPTX also needs updating, but NVPTXTargetMachine only adds two passes)

I’ve ported most of the IR passes and added them to AMDGPU’s opt pipeline.
There are 2 issues remaining:

  1. LegacyDivergenceAnalysis is used in LoopUnswitch to avoid unswitching loops with divergent condition Values. I’m not sure what the state of the LegacyDivergenceAnalysis vs DivergenceAnalysis is. (Also the new PM only has SimpleLoopUnswitch instead of LoopUnswitch).
    Transforms/LoopUnswitch/AMDGPU/divergent-unswitch.ll.
  2. The AMDGPU backend has its own inliner (set here). Any ideas for how to do custom inliner cost modeling in the new PM pipelines? Allow targets to override an analysis pass that the inliner uses to get an InlineAdvisor?
    CodeGen/AMDGPU/amdgpu-inline.ll

We’re sorting out the custom inliner in https://reviews.llvm.org/D94153.

As for LegacyDivergenceAnalysis and Transforms/LoopUnswitch/AMDGPU/divergent-unswitch.ll, if there’s no response, I’ll just pin the test to the legacy PM, since it’s testing an AMDGPU-specific optimization setting (disabling loop unswitching on divergent loop conditions). If there are noticeable performance regressions related to this due to the NPM switch, the AMDGPU community can temporarily use the legacy PM and then port the relevant analysis passes and fix up loop unswitching for AMDGPU.

Then we should be good to go for turning on the new PM for opt when it’s specified on the CMake command line.

Hi Arthur,

I was looking into porting the IR passes in to the NPM more than a year ago. Let me get back to you with more concrete answers.

Many thanks,

Reshabh

1) Unfortunately LegacyDivergenceAnalysis has known bugs (e.g.
https://bugs.llvm.org/show_bug.cgi?id=42741) but DivergenceAnalysis
does not work on irreducible control flow (it comes with a wrapper
that falls back to LegacyDivergenceAnalysis if it detects irreducible
control flow). So neither is perfect. Does the new pass manager
currently support either of them?

I think Sameer has looked into
https://bugs.llvm.org/show_bug.cgi?id=42741 in some detail. I'm not
sure if there are other known bugs.

Jay.

Jay.

Adding Simon Moll. Also replaced my work ID with my personal ID because Outlook at work doesn't seem to play well with llvm-dev.

> >
> > I've ported most of the IR passes and added them to AMDGPU's opt pipeline.
> > There are 2 issues remaining:
> > 1) LegacyDivergenceAnalysis is used in LoopUnswitch to avoid unswitching loops with divergent condition Values. I'm not sure what the state of the LegacyDivergenceAnalysis vs DivergenceAnalysis is. (Also the new PM only has SimpleLoopUnswitch instead of LoopUnswitch).
> > Transforms/LoopUnswitch/AMDGPU/divergent-unswitch.ll.

Here's what I understood so far from a general eyeballing of the code:

1. Legacy DA is a wrapper around GPU DA. I could not find any direct use of GPU DA.
2. It is likely that AMDGPU no longer "requires" Legacy DA. The advantage of the Legacy DA is that it can handle irreducible regions, but we usually convert them into loops for AMDGPU anyway. I don't know if we have reached a point where we don't care about legacy DA in this respect.
3. StructurizeCFG under the new pass manager simply skips DA, and consequently cannot skip uniform regions. This essentially disables an optimization when moving to the new pass manager.
4. Similarly, loop unswitching is an optimization and available in its simple form with the new pass manager. But I don't know for sure if it *must* skip loops with divergent values.

So we (the AMDGPU folks) need to figure out how much is our dependency on any DA in the new pass manager. Besides the above optimizations, isn't it required for later target optimizations around SGPR usage? Is it possible to unblock the transition to the new pass manager, and then later restore these optimizations? Also, I am wondering if can focus on making only the new GPU DA available subject to #2 above.

Sameer.

Here’s what I understood so far from a general eyeballing of the code:

  1. Legacy DA is a wrapper around GPU DA. I could not find any direct use of GPU DA.
  2. It is likely that AMDGPU no longer “requires” Legacy DA. The advantage of the Legacy DA is that it can handle irreducible regions, but we usually convert them into loops for AMDGPU anyway. I don’t know if we have reached a point where we don’t care about legacy DA in this respect.
  3. StructurizeCFG under the new pass manager simply skips DA, and consequently cannot skip uniform regions. This essentially disables an optimization when moving to the new pass manager.
  4. Similarly, loop unswitching is an optimization and available in its simple form with the new pass manager. But I don’t know for sure if it must skip loops with divergent values.

So we (the AMDGPU folks) need to figure out how much is our dependency on any DA in the new pass manager. Besides the above optimizations, isn’t it required for later target optimizations around SGPR usage? Is it possible to unblock the transition to the new pass manager, and then later restore these optimizations? Also, I am wondering if can focus on making only the new GPU DA available subject to #2 above.

To be clear, currently the new PM switch only affects the optimization pipeline, which out of all the uses of Legacy DA only affects LoopUnswitch and StructurizeCFG. The other uses are in the codegen pipeline which isn’t affected.

Any update on this? This is the last known remaining NPM issue assuming https://reviews.llvm.org/D94153 is good to go.

Hi Arthur,

Thanks for following up!

We decided to pin the failing test (divergent-unswitch.ll) to the old pass manager for now, so this is no longer a blocker for flipping the bit in opt.

https://reviews.llvm.org/D95051

The loop-unswitch transform needs divergence analysis to ensure that loops with a divergent conditions are not transformed. This is a correctness issue, which means that the lack of a DA makes loop-unswitching in the NPM unsafe for AMDGPU or any other target that cares about divergence. For now, I’ve filed a bug to track this: https://bugs.llvm.org/show_bug.cgi?id=48819

I am currently attempting to make the new divergence analysis available on the new pass manager.

Sameer.