Dynamically loading native code generated from LLVM IR

Hi,

I'm building LLVM IR. I'd like to compile this IR to native code (I don't want JIT) and immediately load it to execute. So far, I've the following:

1) I can emit the IR to native assembly/object file doing the same thing llc does (using TargetMachine::addPassesToEmitFile).
2) I can dynamically load a precompiled .so file (using llvm::sys::DynamicLibrary::getPermanentLibrary), get a function pointer from that file, and execute.

I can't dynamically load the .o file I produce in step 1 because it's a static library. If I could produce a .so file in step 1, my problem would be solved. llc has a "-relocation-model=pic" option, but the file produced with that did not dynamically load. I got lost in clang's options when trying to find where the "-shared" and "-fPIC" options are used.

So, my question is: Which API should I look at to emit dynamically loadable native code from LLVM IR?

I would also like to emit code to an in-memory stream instead of a file because everything happens at runtime, but that's a secondary concern.

Thanks in advance.

-Baris Aktemur

Hi Baris,

If I could produce a .so file in step 1, my problem would be solved. llc has a "-relocation-model=pic" option, but the file produced with that did not dynamically load.

That relocation-model=pic option usually necessary for a linker to be
able to produce a .so file (it changes how variables are addressed so
that more things can be decided when the .so is loaded), but it won't
produce a .so itself.

So, my question is: Which API should I look at to emit dynamically loadable native code from LLVM IR?

The JIT sounds like it does almost exactly what you want. LLVM's JIT
isn't a classical lightweight, dynamic one like you'd see for
JavaScript or Java. All it really does is produce a native .o file in
memory, take care of the relocations for you and then jump into it (or
provide you with a function-pointer). Is there any other reason you
want to avoid it?

Otherwise, only a full linker is capable of turning a .o into
something loadable with "dlopen" or its equivalents. That's a lot of
work, though I suppose you could fork a process to the system's linker
if there is one.

Tim.

Tim Northover wrote:

Hi Baris,

If I could produce a .so file in step 1, my problem would be solved. llc has a "-relocation-model=pic" option, but the file produced with that did not dynamically load.

That relocation-model=pic option usually necessary for a linker to be
able to produce a .so file (it changes how variables are addressed so
that more things can be decided when the .so is loaded), but it won't
produce a .so itself.

So, my question is: Which API should I look at to emit dynamically loadable native code from LLVM IR?

The JIT sounds like it does almost exactly what you want. LLVM's JIT
isn't a classical lightweight, dynamic one like you'd see for
JavaScript or Java. All it really does is produce a native .o file in
memory, take care of the relocations for you and then jump into it (or
provide you with a function-pointer).

OK ... I have a similar issue. What happens if the code located in the .o file has references to external libs ?
How can I link the inmemory .o file against the external libs ... just to get a fully executable code segment in memory.

Best Regards

Armin Steinhoff

Hi Armin,

OK ... I have a similar issue. What happens if the code located in the .o
file has references to external libs?

If you use dlopen or getPermanentLibrary to open those libraries then
the JIT's relocation processing will normally find the referenced
symbols correctly. (With caveats for remote targets and so on, of
course).

That solution assumes your external libraries are shared, of course.
If they're static then I'm not sure there's a solution at present.
There's no reason in principle why the JIT couldn't load prebuilt
objects or archives, but it doesn't even seem to support multiple LLVM
modules at the moment, so infrastructure is probably missing.

Tim.

Dear Tim,

The JIT sounds like it does almost exactly what you want. LLVM's JIT
isn't a classical lightweight, dynamic one like you'd see for
JavaScript or Java. All it really does is produce a native .o file in
memory, take care of the relocations for you and then jump into it (or
provide you with a function-pointer). Is there any other reason you
want to avoid it?

Based on the experiments I ran, JIT version runs significantly slower than the code compiled to native. But according to your explanation, this shouldn't have happened. I wonder why I witnessed the performance difference.

Thank you.

-Baris Aktemur

Dear Tim,

The JIT sounds like it does almost exactly what you want. LLVM's JIT
isn't a classical lightweight, dynamic one like you'd see for
JavaScript or Java. All it really does is produce a native .o file in
memory, take care of the relocations for you and then jump into it (or
provide you with a function-pointer). Is there any other reason you
want to avoid it?

Based on the experiments I ran, JIT version runs significantly slower than the code compiled to native. But according to your explanation, this shouldn't have happened. I wonder why I witnessed the performance difference.

Did you compile the native version with any optimizations enabled?

Dear Tim,

The JIT sounds like it does almost exactly what you want. LLVM's JIT
isn't a classical lightweight, dynamic one like you'd see for
JavaScript or Java. All it really does is produce a native .o file in
memory, take care of the relocations for you and then jump into it (or
provide you with a function-pointer). Is there any other reason you
want to avoid it?

Based on the experiments I ran, JIT version runs significantly slower than the code compiled to native. But according to your explanation, this shouldn't have happened. I wonder why I witnessed the performance difference.

Did you compile the native version with any optimizations enabled?

Yes. When I dump the IR, I get the same output as "clang -O3". Are the back-end optimizations enabled separately?

Dear Tim,

The JIT sounds like it does almost exactly what you want. LLVM's JIT
isn't a classical lightweight, dynamic one like you'd see for
JavaScript or Java. All it really does is produce a native .o file in
memory, take care of the relocations for you and then jump into it (or
provide you with a function-pointer). Is there any other reason you
want to avoid it?

Based on the experiments I ran, JIT version runs significantly slower than the code compiled to native. But according to your explanation, this shouldn't have happened. I wonder why I witnessed the performance difference.

Did you compile the native version with any optimizations enabled?

Yes. When I dump the IR, I get the same output as "clang -O3". Are the back-end optimizations enabled separately?

Yes, but it's more the code generation model I'm suspecting is what's going on. Specifically, that you're using SelectionDAGISel for static compilation and FastISel for JIT compilation. The latter generated code very quickly, as the name implies, but the quality of that code is generally pretty terrible compared to the static compiler.

Dear Tim,

The JIT sounds like it does almost exactly what you want. LLVM's JIT
isn't a classical lightweight, dynamic one like you'd see for
JavaScript or Java. All it really does is produce a native .o file in
memory, take care of the relocations for you and then jump into it (or
provide you with a function-pointer). Is there any other reason you
want to avoid it?

Based on the experiments I ran, JIT version runs significantly slower than the code compiled to native. But according to your explanation, this shouldn't have happened. I wonder why I witnessed the performance difference.

Did you compile the native version with any optimizations enabled?

Yes. When I dump the IR, I get the same output as "clang -O3". Are the back-end optimizations enabled separately?

Sorry, I misunderstood the question.
I compiled the native version with optimizations enabled, using "clang -shared -fPIC -O3".

In the version that uses JIT, I build the IR, then run the same passes over the IR that "opt -O3" runs to obtain optimized IR. After these runs, I call ExecutionEngine::getPointerToFunction().

I'm not sure I understand your use case, but MCJIT (as opposed to the legacy JIT) does almost exactly what you're asking for. It generates an in-memory object file image (using addPassesToEmitMC) and then loads and links it for execution.

If there's some particular detail you don't like in the way this is happening, you might be able to generate a file as you have and then use the RuntimeDyld interface to load it. The llvm-rtdyld tool does something like this.

-Andy

Kaylor,

do you have some good documented example code which shows the usage of the MCJIT ?
This would help a lot ... the sematic of lots of API calls are not intuitively understandable.

Best Regards

--Armin

Kaylor, Andrew wrote:

Take a look at the MCJIT unit tests under unittests/ExecutionEngine/MCJIT

The MCJITTestBase class does the majority of the interactions with the LLVM API you're referring to.

Good luck,
Dan

Daniel,

I didn't find the MCJIT directory under unitests/ExecutionEngine ... there is only a directory called JIT.
You mean this directory ?

Many thanks

--Armin

Malea, Daniel wrote:

It's definitely there. It was added in r165246 recently so you
probably have an older version.

Amara

Amara,

yes, it's in the svn repository !

Thanks a lot !

Regards

--Armin

Amara Emerson wrote:

Dear Jim,

Dear Tim,

The JIT sounds like it does almost exactly what you want. LLVM's JIT
isn't a classical lightweight, dynamic one like you'd see for
JavaScript or Java. All it really does is produce a native .o file in
memory, take care of the relocations for you and then jump into it (or
provide you with a function-pointer). Is there any other reason you
want to avoid it?

Based on the experiments I ran, JIT version runs significantly slower than the code compiled to native. But according to your explanation, this shouldn't have happened. I wonder why I witnessed the performance difference.

Did you compile the native version with any optimizations enabled?

Yes. When I dump the IR, I get the same output as "clang -O3". Are the back-end optimizations enabled separately?

Yes, but it's more the code generation model I'm suspecting is what's going on. Specifically, that you're using SelectionDAGISel for static compilation and FastISel for JIT compilation. The latter generated code very quickly, as the name implies, but the quality of that code is generally pretty terrible compared to the static compiler.

Is there an option that I can pass to the (MC)JITer to force it to use SelectionDAGISel?

I'm also curious which passes/algorithms are used when I set the MCJIT option to true and the opt level to Aggressive. E.g:

  engineBuilder.setUseMCJIT(true);
  engineBuilder.setOptLevel(llvm::CodeGenOpt::Aggressive);

I adapted lli.cpp to use MCJIT in my code. I get better performance now -- close to statically compiled native code, but still not exactly the same (about 10% slower).

Thank you.

-Baris Aktemur

Hi Baris,

The FastISel selector only gets used if the optimization level is set to none. There's a hidden command line option that will disable it even then (see LLVMTargetMachine.cpp), but I don't think there's a way to get to it if you aren't using the standard command line option handling.

Anyway, my best guess as to the remaining performance difference you're seeing is that it has to do with the code model being used. This is only a guess. MCJIT is limited in the code models it supports (due to limited relocation handling implementation). I'm not sure how much performance impact this has, but I expect that it has some performance implications. On x86+ELF-based systems, it will only work with the static code model. I don't know about other architecture/object format combinations.

-Andy