Slow IR compilation/JIT, profiling points to LLVM?

I'm having issues of my compiler, and JIT execution, of LLVM IR being
rather slow. It's accounting for the vast majority of my full
compilation time. I'm trying to figure out why this is happening, since
it's becoming an impediment. (Note: by slow I mean about 3s of time for
only about 2K of my front-end code, 65K lines of LLVM-IR)

Using valgrind I see some functions which seem out of place and
accounting for the vast majority of the time.

5\.72%; 635,008 Calls; llvm::PMTopLevelManager::findAnalysisPass\(void

const*) <cycle 4>

4\.54%; 3,722,489 Calls; llvm::PMDataManager::findAnalysisPass\(void

const*, bool)'2

4\.11%; 4,604,499 Calls; bool

llvm::DenseMapBase<>::LookupBucketFor<>(void const* const&,
llvm::detail::DenseMapPair<> const*&) const

Also of interest, given the high call count:

1\.32%; 6,915,882 Calls; llvm::FoldingSetNodeID::AddInteger\(unsigned

int) <cycle 4>

The call counts seem quite high given the size of the code. Also, the
`findAnalysisPass` function just seems off, but I don't understand
LLVM's architecture to be sure.

Could this be pointing to something I'm doing wrong in my LLVM setup, or
is it just slow?

I'm reasonably certain I'm compiling LLVM in optimized mode, but for
reference, this is my build line:

cmake \.\. \-DCMAKE\_BUILD\_TYPE=Release \-DLLVM\_ENABLE\_EH=ON

-DLLVM_ENABLE_RTTI=ON -DLLVM_REQUIRES_RTTI=ON -DLLVM_ENABLE_CXX1Y=ON
-DLLVM_LINK_LLVM_DYLIB=ON -DLLVM_ENABLE_FFI=ON
-DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_OPTIMIZED_TABLEGEN=ON
-DCMAKE_INSTALL_PREFIX="/opt/llvm/install"

The overall time split, from valgrind, between shared libraries in my
code is:

80\.48%, libLLVM\-6\.0\.so
8\.83% libc\-2\.23\.so
2\.34% libleaf\_lang\.so \(my front\-end\)

Hi,

Could you share how you compile IR and which version of JIT you use (Orc, MCJIT)?
Could it be that you are using interpreter instead of actual JIT?

Cheers,
Alex.

I'll post the fragments of code below related to setup/exec. I
double-checked the object file generation and that appears to be a lot
better now, around only 2% total time (time in ld is very high, but I
guess that's not LLVM's responsibility). This has changed recently I
think, it used to be higher. Nonetheless, let's look at the JIT (or not
JIT) for now).

I use the execution engine:

std::string errStr;
 llvm::EngineBuilder builder\{ unique\_ptr&lt;llvm::Module&gt;\(module\) \};
llvm::ExecutionEngine \* ee = builder\.
    setErrorStr\( &amp;errStr \)\.
    setEngineKind\( llvm::EngineKind::JIT \)\.
    setTargetOptions\( topts \)\.
    create\(\);

ee\-&gt;finalizeObject\(\);
auto main = ee\-&gt;FindFunctionNamed\( &quot;main&quot; \);
STATE\_CHECK\( main, &quot;missing\-main&quot; \);

std::vector&lt;llvm::GenericValue&gt; args\(2\);
args\[0\]\.IntVal = llvm::APInt\( platform::target\-&gt;abi\_int\_size, 0 \);
args\[1\]\.PointerVal = nullptr;
llvm::GenericValue gv = ee\-&gt;runFunction\( main, args \);
auto ret = int\(gv\.IntVal\.getSExtValue\(\)\);
delete ee;
return ret;

In case relevant, this is how I setup the module/context:

llvm::InitializeNativeTarget\(\);
llvm\_context\.reset\( new llvm::LLVMContext \);
module = new llvm::Module\( &quot;test&quot;, \*llvm\_context \);
module\-&gt;setTargetTriple\( platform::target\-&gt;triple \);
del\_module = true;

std::string err\_str;
auto triple\_str = llvm::Triple::normalize\(platform::target\-&gt;triple\);
llvm::Target const \*  target = llvm::TargetRegistry::lookupTarget\(

triple_str, err_str );
STATE_CHECK( target, err_str );
auto targetMachine = target->createTargetMachine(triple_str,
"generic", "", llvm::TargetOptions{},
llvm::Optional<llvm::Reloc::Model>());

module\-&gt;setDataLayout\( targetMachine\-&gt;createDataLayout\(\) \);

Versions:
LLVM 6.0.0
GCC 5.4.0
(K)Ubuntu 16.04

Hi,

Is there another way to get LLVM to check the correctness of my IR
without the assertions? That's what I'm assuming I need the flag for
(it's been a long time since I experimented with it)

If there is no way I guess I'll have to produce two versions of LLVM. I
still commonly get type errors in my LLVM IR.

Hi,

> You're building LLVM with assertions enabled
> (-DLLVM_ENABLE_ASSERTIONS=ON).
> Some of those are fairly expensive...
>

Is there another way to get LLVM to check the correctness of my IR
without the assertions? That's what I'm assuming I need the flag for
(it's been a long time since I experimented with it)

I don't think so.

If there is no way I guess I'll have to produce two versions of LLVM. I
still commonly get type errors in my LLVM IR.

That's what I do (using Orc to JIT parts of SQL queries). By default I
have the debug build of postgres linked against debug LLVM w/
assertions, and the optimized build against an optimized LLVM wo/
assertions (albeit with symbols).

Greetings,

Andres Freund

I've tried a builld with assertions off and it has only a relatively
minor impact (a 10% drop in time spent in LLVM). The instruction cound
for `findAnalysisPass` is gone, but the things like `AddInteger` are
still called millions of times.

Given the number of IR instructions (at most 65K since a 65K IR file),
the millions of calls to the Add, and FindBucket functions seems wrong.

Hi,

> That's what I do (using Orc to JIT parts of SQL queries). By default I
> have the debug build of postgres linked against debug LLVM w/
> assertions, and the optimized build against an optimized LLVM wo/
> assertions (albeit with symbols).

I've tried a builld with assertions off and it has only a relatively
minor impact (a 10% drop in time spent in LLVM). The instruction cound
for `findAnalysisPass` is gone, but the things like `AddInteger` are
still called millions of times.

The findAnalysisPass symbols in the profile where what made me think of
assertions, because I frequently see them at the top when using LLVM w/
assertions.

Given the number of IR instructions (at most 65K since a 65K IR file),
the millions of calls to the Add, and FindBucket functions seems wrong.

I suggest posting an example IR file, so others can look at why things
are slow. Without a reproducer that's going to be hard.

Have you looked at tweaking the optimization pipeline / checking which
parts of the pipeline are slow? You can use something like:

opt -O3 -time-passes -o /dev/null /path/to/bitcode/file.bc

Greetings,

Andres Freund