Codegen slower with new PassManager

fwinter · February 9, 2024, 8:38pm

I’m currently making the move to the new pass manager and have arrived at a working version. However, the codegen takes significantly more time now than before with the legacy pass manager.
My application builds a module, links in CUDA libdevice, runs some passes and calls NVPTX codegen. Here’s the setup that I use with the legacy manager:

// build Mod, link in libdevice
llvm::legacy::PassManager PM;
PM.add( llvm::createInternalizePass( all_but_kernel_name ) );
PM.add( llvm::createNVVMReflectPass( sm ));
PM.add( llvm::createGlobalDCEPass() );
PM.run(*Mod);
// NVPTX codegen is fast

With the new pass manager I’m doing the following and the subsequent codegen takes signifacantly longer (more than 10x)

// build Mod, link in libdevice
LoopAnalysisManager LAM;
FunctionAnalysisManager FAM;
CGSCCAnalysisManager CGAM;
ModuleAnalysisManager MAM;

PassBuilder PB(TargetMachine.get());

PB.registerModuleAnalyses(MAM);
PB.registerCGSCCAnalyses(CGAM);
PB.registerFunctionAnalyses(FAM);
PB.registerLoopAnalyses(LAM);
PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);

ModulePassManager MPM = PB.buildPerModuleDefaultPipeline(OptimizationLevel::O2);
MPM.addPass(InternalizePass(all_but_kernel_name));
MPM.addPass(GlobalDCEPass());

MPM.run(*Mod,MAM);
// NVPTX codegen takes much more time.

I timed the MPM.run call and it’s pretty fast. But the codegen takes more time now.
Is this the closest I can get to the previous setup?

aeubanks · February 9, 2024, 8:57pm

You’re running the -O2 pipeline with the new pass manager but not the old one. That can increase size due to optimizations like inlining.

fwinter · February 9, 2024, 9:28pm

I am aware but I don’t know how to avoid the call to
PB.buildPerModuleDefaultPipeline because this is the one that schedules the NVVM reflect pass right at the beginning. I tried setting the optimization level to O0 but that ran into problems with LLVM 16:

llvm/lib/Passes/PassBuilderPipelines.cpp:1399: llvm::ModulePassManager llvm::PassBuilder::buildPerModuleDefaultPipeline(llvm::OptimizationLevel, bool): Assertion `Level != OptimizationLevel::O0 && “Must request optimizations for the default pipeline!”’ failed.

with LLVM 17 this is fine though. But still, setting O0 the codegen time doesn’t go down.

How can I make sure NVVM reflect is correctly scheduled without any call to PB.buildPerModuleDefaultPipeline?
(The module I’m building doesn’t have any stack allocation (alloca) or loops. So no need for any standard opt).

aeubanks · February 9, 2024, 9:32pm

PB.buildO0DefaultPipeline(OptimizationLevel::O0) should do it

(it was a weird quirk that you had to call buildO0DefaultPipeline() instead of buildPerModuleDefaultPipeline(OptimizationLevel::O0), that was changed recently)

fwinter · February 9, 2024, 9:52pm

Yes, that works with LLVM 16. Thanks!

Topic		Replies	Views
NVPTX codegen surprisingly slow on some functions Code Generation	9	481	April 1, 2022
Using the new pass manager for CodeGen LLVM Dev List Archives	4	279	June 27, 2019
status of CodeGen in new Pass Manager LLVM Dev List Archives	11	297	November 16, 2021
Review wanted - contribution for legacy pass manager (machine pass extensions) LLVM Dev List Archives	6	109	September 6, 2021
How to migrate PassManager CodeGenPasses for LLVM v6.x? LLVM Dev List Archives	0	79	August 30, 2017

Codegen slower with new PassManager

Related topics