How to make LLVM go faster?

Here is some timing information from running the Zig standard library tests:

$ ./zig test …/std/index.zig --enable-timing-info
Name Start End Duration Percent
Initialize 0.0000 0.0010 0.0010 0.0001
Semantic Analysis 0.0010 0.9968 0.9958 0.1192
Code Generation 0.9968 1.4000 0.4032 0.0483
LLVM Emit Output 1.4000 8.1759 6.7760 0.8112
Build Dependencies 8.1759 8.3341 0.1581 0.0189
LLVM Link 8.3341 8.3530 0.0189 0.0023
Total 0.0000 8.3530 8.3530 1.0000

81% of the time was spent waiting for LLVM to turn a Module into an object file. This is with optimizations off, FastISel, no module verification, etc.

How can I speed this up? Any tips or things to look into?

Here’s the function that 81% of the time is spent inside:

bool ZigLLVMTargetMachineEmitToFile(LLVMTargetMachineRef targ_machine_ref, LLVMModuleRef module_ref,
const char *filename, ZigLLVM_EmitOutputType output_type, char **error_message, bool is_debug, bool is_small)
{
std::error_code EC;
raw_fd_ostream dest(filename, EC, sys::fs::F_None);
if (EC) {
error_message = strdup((const char )StringRef(EC.message()).bytes_begin());
return true;
}
TargetMachine
target_machine = reinterpret_cast<TargetMachine
>(targ_machine_ref);
target_machine->setO0WantsFastISel(true);

Module* module = unwrap(module_ref);

PassManagerBuilder *PMBuilder = new(std::nothrow) PassManagerBuilder();
if (PMBuilder == nullptr) {
*error_message = strdup(“memory allocation failure”);
return true;
}
PMBuilder->OptLevel = target_machine->getOptLevel();
PMBuilder->SizeLevel = is_small ? 2 : 0;

PMBuilder->DisableTailCalls = is_debug;
PMBuilder->DisableUnitAtATime = is_debug;
PMBuilder->DisableUnrollLoops = is_debug;
PMBuilder->SLPVectorize = !is_debug;
PMBuilder->LoopVectorize = !is_debug;
PMBuilder->RerollLoops = !is_debug;
// Leaving NewGVN as default (off) because when on it caused issue #673
//PMBuilder->NewGVN = !is_debug;
PMBuilder->DisableGVNLoadPRE = is_debug;
PMBuilder->VerifyInput = assertions_on;
PMBuilder->VerifyOutput = assertions_on;
PMBuilder->MergeFunctions = !is_debug;
PMBuilder->PrepareForLTO = false;
PMBuilder->PrepareForThinLTO = false;
PMBuilder->PerformThinLTO = false;

TargetLibraryInfoImpl tlii(Triple(module->getTargetTriple()));
PMBuilder->LibraryInfo = &tlii;

if (is_debug) {
PMBuilder->Inliner = createAlwaysInlinerLegacyPass(false);
} else {
target_machine->adjustPassManager(*PMBuilder);

PMBuilder->addExtension(PassManagerBuilder::EP_EarlyAsPossible, addDiscriminatorsPass);
PMBuilder->Inliner = createFunctionInliningPass(PMBuilder->OptLevel, PMBuilder->SizeLevel, false);
}

addCoroutinePassesToExtensionPoints(*PMBuilder);

// Set up the per-function pass manager.
legacy::FunctionPassManager FPM = legacy::FunctionPassManager(module);
auto tliwp = new(std::nothrow) TargetLibraryInfoWrapperPass(tlii);
FPM.add(tliwp);
FPM.add(createTargetTransformInfoWrapperPass(target_machine->getTargetIRAnalysis()));
if (assertions_on) {
FPM.add(createVerifierPass());
}
PMBuilder->populateFunctionPassManager(FPM);

// Set up the per-module pass manager.
legacy::PassManager MPM;
MPM.add(createTargetTransformInfoWrapperPass(target_machine->getTargetIRAnalysis()));
PMBuilder->populateModulePassManager(MPM);

// Set output pass.
TargetMachine::CodeGenFileType ft;
if (output_type != ZigLLVM_EmitLLVMIr) {
switch (output_type) {
case ZigLLVM_EmitAssembly:
ft = TargetMachine::CGFT_AssemblyFile;
break;
case ZigLLVM_EmitBinary:
ft = TargetMachine::CGFT_ObjectFile;
break;
default:
abort();
}

if (target_machine->addPassesToEmitFile(MPM, dest, ft)) {
*error_message = strdup(“TargetMachine can’t emit a file of this type”);
return true;
}
}

// run per function optimization passes
FPM.doInitialization();
for (Function &F : *module)
if (!F.isDeclaration())
FPM.run(F);
FPM.doFinalization();

MPM.run(*module);

if (output_type == ZigLLVM_EmitLLVMIr) {
if (LLVMPrintModuleToFile(module_ref, filename, error_message)) {
return true;
}
}

return false;
}

First step is probably setting TimePassesIsEnabled to true and looking at the output. It’s hard to say where the time is going without any numbers. -Eli

Thanks, that was a really helpful suggestion. If you’re curious- here are some of the high cost areas:

The X86 assembly printer is badly named. I think its a leftover from before LLVM had an integrated assembler. It’s where the assembly would have been printed. Now it is where MachineInstrs are converted to MCInsts and either printed or turned into binary.

It might be interesting to know whether the percentages are basically the same for non-x86.