I’ve been experimenting with enabling the new PM by default in our toolchain, using the -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=ON option. However, when I do that, I see a lot (498) of failing tests.
I’ve filed PR40724 for this issue and after a bit of investigating I also found out why these tests are failing. It boils down to two issues:
Unlike the legacy pass manager, the new pass manager always makes the target machine available to passes so we have to always instantiate it even when we’re only emitting bitcode, see https://github.com/llvm/llvm-project/blob/master/clang/lib/CodeGen/BackendUtil.cpp#L952. This breaks when some targets tested by Clang don’t match targets built by LLVM. In our case this is more prominent because we disable all targets that we don’t support in our toolchain, but even in the default configuration you’ll still see failures for experimental targets (e.g. RISC-V).
Some Clang tests match the bitcode that’s being emitted, but the emitted sometimes differs between the legacy and new pass manager.
Has anyone already been thinking on how to handle these issues?
For the first one, the obvious solution is to mark all target-specific tests with appropriate REQUIRES: <target>-registered-target conditions. The downside is that we’re going to loose coverage for some of the tests that are being run today with the legacy pass manager even when the target is disabled, but we’re already seem to be using these conditions in many Clang tests so maybe it’s not a big problem.
For the second issue, we could set a new feature config in lit whenever new PM is enabled (.e.g experimental-new-pm) and then either mark the broken tests as unsupported under new PM or provide two different bitcode versions to match against for legacy and new PM. It’s probably easiest to start by disabling the failing tests first and then try and update all of them to make them work under new PM.
Do you have any opinion on this or other ideas/suggestions?
Why does the new PM always make the target machine available? That seems inconsistent with our past directions. We go to all this trouble to make datalayout encapsulate everything IR-level passes need to know about the target (and TLI, I guess), we have the triple, metadata for wchar_t and all this stuff… and then we just let IR passes talk to the Target?
I think it would be a shame to lose the ability to run opt and clang -S -emit-llvm for a target without compiling in the target’s code generator.
Ø Some Clang tests match the bitcode that’s being emitted, but the emitted sometimes differs between the legacy and new pass manager.
I didn’t think all passes were setup to run with the new PM, or if they were, they didn’t necessarily run in the same order as with the old PM. You’d certainly see different IR and break tests.
Why does the new PM always make the target machine available? That seems inconsistent with our past directions. We go to all this trouble to make datalayout encapsulate everything IR-level passes need to know about the target (and TLI, I guess), we have the triple, metadata for wchar_t and all this stuff… and then we just let IR passes talk to the Target?
We don’t let the IR passes talk to the Target, we let the Target provide customized IR passes. Think TTI and such. The only way to wire up several interesting parts of the “normal” pass pipelines reach into the target for customization of the IR pipeline.
We still have layering that enforces this – the IR passes don’t accept the target in any material way because they don’t depend on those libraries.
The new PM’s builder hoists some logic that was previously only available inside the opt tool into the library itself, and that is where the TargetMachine comes from. This works exactly like the opt tool does currently. It does not make that visible to the actual IR passes in any way (and it can’t really – they don’t have access to the fundamental types needed).
I think it would be a shame to lose the ability to run opt and clang -S -emit-llvm for a target without compiling in the target’s code generator.
Currently, identical inputs and command line options to opt and clang will produce different outputs based on whether a target happened to be compiled in at build time. That seems worse to me than producing an error?
If this is really important, we could add support for a “null” target machine and thread that through the entire thing. But it seems to provide little value as it doesn’t change the layering-provided protection, it just allows you to use the tools to produce slightly unrealistic outputs for targets that weren’t configured when the tool was built.
I started adding the UNSUPPORTED: experimental-new-pass-manager annotations and I noticed that many of Clang tests that required these are the ones that doesn’t have any backend target, specifically le32 (used AFAIK only by PNaCl) and spir. This means we wouldn’t be able to test these targets with the new PM unless we introduce the support for the “null” target machine. This may not be such a big problem though since neither of these target is actively maintained and by the time we deprecate the legacy PM they may very well be gone altogether.
I’ve also tried modifying BackendUtil to use the same logic it already uses with the old PM for the new PM, i.e. using the null target machine if the target doesn’t use codegen and it seems to be working fine (all le32 and spir tests are passing). Is this setup officially supported and are IR passes required to handle this case gracefully, or is it just the fact that no IR pass currently talks to the Target and we shouldn’t rely on this?
I started adding the UNSUPPORTED: experimental-new-pass-manager annotations and I noticed that many of Clang tests that required these are the ones that doesn’t have any backend target, specifically le32 (used AFAIK only by PNaCl) and spir. This means we wouldn’t be able to test these targets with the new PM unless we introduce the support for the “null” target machine. This may not be such a big problem though since neither of these target is actively maintained and by the time we deprecate the legacy PM they may very well be gone altogether.
Yeah, I suspect these should be using the actual targets, or not in the tree somewhat regardless of the answer here. But maybe moot.
I’ve also tried modifying BackendUtil to use the same logic it already uses with the old PM for the new PM, i.e. using the null target machine if the target doesn’t use codegen and it seems to be working fine (all le32 and spir tests are passing). Is this setup officially supported and are IR passes required to handle this case gracefully, or is it just the fact that no IR pass currently talks to the Target and we shouldn’t rely on this?
Ooof… I completely failed to document the contract on this.
However, we support a default argument of nullptr for it as well, so I actually think there is no way this isn’t 100% supported. I think this is a fine change to make to Clang (and even to opt if desired). It at least gets this out of the critical path.
I’m still super interested in Reid’s thoughts on the points I laid out about whether this is a good pattern. But I don’t think this is a blocking issue given your find.
Petr Hosek via cfe-dev <cfe-dev@lists.llvm.org> writes:
2. Some Clang tests match the bitcode that's being emitted, but the
emitted sometimes differs between the legacy and new pass manager.
This concerns me greatly. I would expect the output to not be dependent
on the pass manager used. I understand that clang has two separate code
paths to set up pipelines for the old and new pass managers. Are these
expected to be functionally equivalent? If not, why not?
Doesn’t that really only matter if a full pass pipeline gets built, i.e. opt/clang -O2? Most well-behaved clang and opt tests use -disable-llvm-passes or opt -pass=mypass, and they don’t require a TargetMachine to run. Perhaps we could diagnose an error if the TargetMachine would be used to build the pipeline. Basically, move the null check and error to the point of usage.
I feel like this gives us some incremental value of being able to disable some backends for build time or code size reasons while still being able to run IR generation and transformation tests that happen to use that triple.