Where is opt spending its time?

I am trying to improve my application’s compile-time performance.

On a given workload, I take 68 seconds to compile some code. If I disable the LLVM code generation (i.e. I will generate IR instructions, but skip the LLVM optimization and instruction selection steps) then my compile time drops to 3 seconds. If I write out the LLVM IR (just to prove that I am generating it) then my compile time is 4 seconds. We’re spending >90% of the time in LLVM code generation.

To try to determine if there’s anything I can do, I ran:

time /tools/llvm/3.7.1dbg/bin/opt -O1 -filetype=obj -o opt.o my_ir.ll -time-passes

and I get:

What activity accounts for the unaccounted-for time?

If you're on Linux, consider using a proper CPU profiler, such as
perf(1). It's really easy to use -- on x86-64, compiling
RelWithDebInfo with -fno-omit-frame-pointer has given me excellent
results.

Good luck.

Hi,

I’m having the same issue. You can speed up the JIT by disabling the code gen optimizations.
when creating the execution engine:
.setOptLevel(llvm::CodeGenOpt::None)
and try to enable Fast instruction selection
.setTargetOptions

But with the above applied my profiler (release mode ofcourse) is still showing a lot of time spent in JIT (86%) code gen.
It’s also weird that when I look at the individual functions in the profile, malloc and free are taking up 80% of the total time.
40% of it is done with a smallvectorimpl resize in the passmanager.

The modules generally contains around 3 small functions. It should be fast.
For my project fast JIT time is more important than the actual runtime since the statements are “simple”. I do run a passmanager on functions to optimize the IR.

So what is generally the best approach when you require fast code generation time ? Specifically, how to minimize time spent in going from IR to native Code.

Are you running the IR verifier?

You remark on the smallvectorimpl resize. This might be the same issue I found in the IR verifier.

The verifier has a check that applies to address space casts. This check will run even if you have no address space casts in your IR (I suspect the usual case). The check applies to pointers embedded within data tables. My IR has a lot of read-only data in tables with bitcast instructions to cast between pointer types, and this places a load on the verifier far beyond what the data structure is designed to hold.

A typical example: code generation for a large IR file takes 23 seconds, but if you enable the verifier, it takes over a minute.

I really ought to file this bug properly.