I’m currently making the move to the new pass manager and have arrived at a working version. However, the codegen takes significantly more time now than before with the legacy pass manager.
My application builds a module, links in CUDA libdevice, runs some passes and calls NVPTX codegen. Here’s the setup that I use with the legacy manager:
// build Mod, link in libdevice
llvm::legacy::PassManager PM;
PM.add( llvm::createInternalizePass( all_but_kernel_name ) );
PM.add( llvm::createNVVMReflectPass( sm ));
PM.add( llvm::createGlobalDCEPass() );
PM.run(*Mod);
// NVPTX codegen is fast
With the new pass manager I’m doing the following and the subsequent codegen takes signifacantly longer (more than 10x)
// build Mod, link in libdevice
LoopAnalysisManager LAM;
FunctionAnalysisManager FAM;
CGSCCAnalysisManager CGAM;
ModuleAnalysisManager MAM;
PassBuilder PB(TargetMachine.get());
PB.registerModuleAnalyses(MAM);
PB.registerCGSCCAnalyses(CGAM);
PB.registerFunctionAnalyses(FAM);
PB.registerLoopAnalyses(LAM);
PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);
ModulePassManager MPM = PB.buildPerModuleDefaultPipeline(OptimizationLevel::O2);
MPM.addPass(InternalizePass(all_but_kernel_name));
MPM.addPass(GlobalDCEPass());
MPM.run(*Mod,MAM);
// NVPTX codegen takes much more time.
I timed the MPM.run call and it’s pretty fast. But the codegen takes more time now.
Is this the closest I can get to the previous setup?