MCJIT versus getLazyBitcodeModule?

I'm having a problem with MCJIT (in LLVM 3.3 and 3.4), in which it's not resolving symbol mangling in a precompiled bitcode in the same way as old JIT. It's possible that it's just my misunderstanding. Maybe somebody can spot my problem, or identify it as an MCJIT bug.

Here's my situation, in a nutshell:

* I am assembling IR and JITing in my app. The IR may potentially make calls to a large body of code that I precompile to bitcode using "clang++ -S --emit-llvm", then create a .cpp file containing the bitcode, which is compiled into my app.

* Before JITing the dynamic code, my app initializes the Module like this:

    llvm::MemoryBuffer* buf =
        llvm::MemoryBuffer::getMemBuffer (llvm::StringRef(bitcode, bitcode_size), name);
    llvm::Module *m = llvm::getLazyBitcodeModule (buf, context(), err);

  where bitcode is a big char array holding the precompiled bitcode. The idea is to
  "seed" the module with that precompiled bitcode so that any calls I inserted into the IR
  will work properly.

* When I JIT, I just refer to functions in the bitcode like "foo", if that's what I called it in the original .cpp file that was turned into bitcode.

* Traditionally, I have created a JIT execution engine like this:

    m_llvm_exec = llvm::ExecutionEngine::createJIT (module(), err,
                                    jitmm(), llvm::CodeGenOpt::Default,
                                    /*AllocateGVsWithCode*/ false);

All has worked fine, this is a system that's seen heavy production use for a couple years now.

Now I'm trying to make this codebase work with MCJIT, and I've run into some trouble. Here's how I'm setting up the ExecutionEngine for the MCJIT case:

    m_llvm_exec = llvm::EngineBuilder(module())
                            .setEngineKind(llvm::EngineKind::JIT)
                            .setErrorStr(err)
                            .setJITMemoryManager(jitmm())
                            .setOptLevel(llvm::CodeGenOpt::Default)
                            .setUseMCJIT(USE_MCJIT)
                            .create();

USE_MCJIT is 1 when I'm building the code to use MCJIT. I'm initializing the buffer and seeding it with the precompiled bitcode in the same way as always, as outlined above.

The basic problem is that it's not finding the symbols in that bitcode.. I get an error message back like this:

  Program used external function '_foo' which could not be resolved!

So it seems that it's an issue of whether or not the underscore prefix is included when looking up the function from the module, and old JIT and MCJIT disagree.

Furthermore, if I change the creation of the module from using llvm::getLazyBitcodeModule to this:

    llvm::Module *m = llvm::ParseBitcodeFile (buf, context(), err);

it works just fine. But of course, I'd really like to deserialize this bitcode file lazily, because it's got a ton of functions potentially called by my IR, but any given bit of code that I'm JITing only uses a tiny subset, so the JIT speed has greatly reduced overhead (10-20x!) when using the lazy option, so that's considered fairly critical for our app.

So, in short:

   old JIT + ParseBitcodeFile = works
   old JIT + getLazyBitcodeModule = works
   MCJIT + ParseBitcodeFile = works
   MCJIT + getLazyBitcodeModule = BROKEN

Does anybody have advice? Thanks in advance for any help.

Hi Larry,

I'm pretty sure MCJIT won't do what you need without some changes to the way you're doing things.

When MCJIT compiles a Module, it compiles the entire Module and tries to resolve any and all undefined symbols. I'm not familiar with getLazyBitcodeModule, but at a glance (and cross referencing your comments below) it seems that it tries to add GlobalValues to a Module as they are needed. MCJIT doesn't let you modify Modules once it has compiled them, so that's not going to work. Even if we built some scheme into MCJIT to materialize things before it compiled a Module it would end up materializing everything, so that wouldn't help you.

You have a few options.

1. You can continue to load the pre-existing bitcode with getLazyBitcodeModule then emit your dynamic code into a separate Module which gets linked against the "lazy" Module before it is handed off to MCJIT.

2. You can use MCJIT's object caching mechanism to load a fully pre-compiled version of your bitcode. Again you'd need to have your dynamic code in a separate Module, but in this case MCJIT would take care of the linking. If you know the target architecture ahead of time you can install the cached object with your application. If not, you'd need to take the large compilation hit once. After that it should be fairly fast. The downside is that you'd potentially have a lot more code loaded into memory than you needed.

3. You can break the pre-compiled code into smaller chunks and compile them into an archive file. MCJIT recently added the ability to link against archive files. This would give you control over the granularity at which pieces of your pre-compiled code get loaded while also giving you the speed of the cached object file solution. The trade-off is that for this solution you do need to know the target architecture ahead of time.

Hope this helps.

-Andy

This is sounding rather like getLazyBitcodeModule is simply incompatible with MCJIT. Can anybody confirm that this is definitely the case? Is it by design, or by omission, or bug?

Re your option #1 and #2 -- sorry for the newbie questions, but can you point me to docs or code examples for how the linking or object caching should be achieved? If I do either of these rather than seeding my bitcode into the same module where I'm dynamically assembling my IR, does that mean that it will be unable to inline those functions?

It's possible that my best option may be to give up on getLazyBitcodeModule when using MCJIT, reverting back to ParseBitcodeFile, but try to lower compilation overhead by carefully dividing my module into just the parts that get the most bang-for-buck with inlining (keep those in the bitcode, but much less to compile), and move the rest into my app and have it resolve the symbols but not be able to inline.

  -- lg

I would say that the incompatibility is by design. Not that anyone specifically wanted the incompatibility, but rather it's a known artifact of the MCJIT design.

You can find an example of MCJIT's object caching here: Object Caching with the Kaleidoscope Example Program - The LLVM Project Blog

The two blog entries before that may also be of use to you: http://blog.llvm.org/2013_07_01_archive.html

I don't where you can find an example of the Module linking I described, but I think llvm::Linker is the class to look at.

-Andy

Thanks for the pointers.

Am I correct in assuming that putting the precompiled bitcode into a second module and linking (or using the object caches) would result in ordinary function calls, but would not be able to inline the functions?

  -- lg

Actually, I think linking just pulls referenced functions and variables into your module, so they could be inlined in that case.

On the other hand, if I'm correct about how Module linking works, you may run into problems if multiple Modules which link in the same function are passed to the same instance of MCJIT. There ought to be a way around this whereby you could first ask MCJIT if it has a symbol and if it does link against that (which wouldn't allow inlining) and it not link against the big bitcode Module. Unfortunately I think you'd have to build your own support for that.

Also, I should say that there is every possibility that I am misunderstanding how Module linking works. I haven't done anything with that in a while and I think that part of the code has been updated since then.

-Andy

Hi Larry,

Inlining from remote modules with MCJIT is not so easy, but possible
(at least it works for me). I'm working since two days on this problem
(from an end-user perspective, I'm not a llvm developer:)). As it can
help you (and other people), I explain what I have done (my mail is
maybe too long for the mailing list, sorry!).

So, basically, inlining from other modules (runtime module included)
is possible in MCJIT. The solution is maybe a little bit ugly... Just
to explain what I do and my problems, I'm involved in the development
of vmkit (a library to build virtual machines). I have to inline
runtime functions defined in c++ to achieve good performance (for
example the type checker for j3, the Java virtual machine developed
with vmkit). I think that your problem is not so far from mine (I also
reload my own bitcode when I start vmkit).

So, I give you the picture (I can also send you my llvm pass or other
relevant code if you need them). It can help as a starting point. I
wrote the inling pass today, so it's maybe still buggy :).

Basically, I have two kind of modules: a module that contains the
runtime functions (defined in c++) and the other modules that contain
functions that I have to jit compile. To simplify, let say that I have
only one module to jit. In the jit-module, I want to call functions
defined in the runtime-module. I have thus three problems to solve:
* The verifier does not like when you call a function defined in the
runtime module directly from the jit module (it prevents external
references to other modules). So, I have to avoid this as much as
possible.
* The jited module has to find the llvm code of the runtime functions
for inlining
* When a function is not inlined, you have to provide the address of
the function to MCJIT (I use dlsym for that purpose).

What I do:
- MCJIT only manages the jit-module (the runtime-module is not
associated to MCJIT through addModule)
- When I have to call a runtime function from the jit-module, I define
an external reference to the function in the jit-module. Something
like:

llvm::Function* orig = runtimeModule->getFunction("my-function");
llvm::Function* copy =
(llvm::Function*)jitModule->getOrInsertFunction(orig->getName(),
orig->getFunctionType());

This step is not mandatory as you will see after (but I have not
tested a direct use of remote references).

- Then I use a llvm pass (a FunctionPass). For each function, I
explore each of the CallSite. If the callsite goes to a function that
does not have a definition (i.e., a runtime function), I find the
original llvm::Function*. I use something like that:

  bool FunctionInliner::runOnFunction(llvm::Function& function) {
    bool Changed = false;

    for (llvm::Function::iterator bit=function.begin();
bit!=function.end(); bit++) {
      llvm::BasicBlock* bb = bit;

      for(llvm::BasicBlock::iterator it=bb->begin(); it!=bb->end():wink: {
        llvm::Instruction *insn = it++;

        if (insn->getOpcode() != llvm::Instruction::Call &&
            insn->getOpcode() != llvm::Instruction::Invoke) {
          continue;
        }

        llvm::CallSite call(insn);
        llvm::Function* callee = call.getCalledFunction();

        if(!callee)
          continue;

        if(callee->isDeclaration()) { /* maybe a foreign function? */
          llvm::Function* original =
runtimeModule->getFunction(callee->getName());
          if(original) {
            /* if you use lazybitcode..., don't forget to materialize
the original here with */
            original->Materialize();

At this step, you can directly inline your code if you want to
systematically inline code:
           llvm::InlineFunctionInfo ifi(0);
           bool isInlined = llvm::InlineFunction(call, ifi, false);
           Changed |= isInlined;

Or, if you don't want to always inline the code, you can guard the
inlining after having used the inline analysis pass:
   llvm::InlineCostAnalysis costAnalysis;
   llvm::InlineCost cost = costAnalysis.getInlineCost(call, 42); /* 42
is the threshold */
   if(cost.isAlways()) || (!cost.isNever() && (cost))) {
     /* inlining goes here */
   }

After this step, you have a problem. The inlined function can itself
contain calls to the runtime functions. So, at this step, it's ugly
because I have a function that potentially contains external
references... What I do, I simply re-explore the code with
    if(isInlined) {
       it = bb->begin();
       continue;
    }

and for each function, if its defining module is not the jitModule, a
replace the call with a local call. Something like that:

        if(callee->getParent() != function.getParent()) {
          llvm::Function* local =
(llvm::Function*)function.getParent()->getOrInsertFunction(callee->getName(),
callee->getFunctionType());
          callee->replaceAllUsesWith(local);
          Changed = 1;
        }

After this step, you will have a module that only contains local
references and that contain your prefered runtime code inlined.

- Now, you have to solve the last problem, finding symbols from the
runtimeModule when they are not inlined (global values or functions).
In my case, I have defined my own SectionMemoryManager:

  class CompilationUnit : public llvm::SectionMemoryManager {
    uint64_t getSymbolAddress(const std::string &Name) {
      return (uint64_t)dlsym(SELF_HANDLE, Name.c_str() + 1);
        /* + 1 with MacOS, + 0 with Linux */
    }
  }

which is called by MCJIT to resolve external symbols when the jited
module is loaded in memory (you have to use
EngineBuilder.setMCJITMemoryManager).

If, like me, you want to also inline functions from jited modules,
it's a little bit more tricky because the llvm::Function* original =
runtimeModule->getFunction(callee->getName()); is not enough. I have
defined my own symbol table (a hash map) that associates function
identifiers with a structure that contains both the original llvm
function of the callee and its address in memory (also used in the
SectionMemoryManager).

Good luck :slight_smile:
Gaël

Hi Andrew,

The solution with the linker works perfectly, but it means that you
will have to recompile everything when you will provide the linked
module to mcjit. It takes too many time if the runtime is large (I
have also tested this solution:)).

See you,
Gaël

Hi Gael, I tried converting to your approach but I had some issues making sure that all symbols accessed by the jit modules have entries in the dynamic symbol table.

To be specific, my current approach is to use MCJIT (using an objectcache) to JIT the runtime module and then let MCJIT handle linking any references from the jit’d modules; I just experimented with what I think you’re doing, and compiling my runtime and directly linking it with the rest of the compiler, and then tying together references in the jit modules to entities in the compiler.

I got it working for the case of “standard” functions and globals, but had some trouble with other types of symbols. I don’t know the right terminology for these things, but I couldn’t get methods defined in headers (ex: a no-op virtual destructor) to work properly. I guess that’s not too hard to work around by either putting it into a cpp file or maybe with some objcopy magic, but then I ran into the issue of string constants. Again, my knowledge of the terminology isn’t great, but it looks like those don’t get symbols in the object file but they get their own sections, and since I have multiple source files that I llvm-link together, the constants get renamed in the LLVM IR and have no relation to the section names. Maybe there’s a workaround by compiling all my runtime sources as a single file so no renaming happens, and then some hackery to get the section names exported, but I guess I’m feeling a little doubtful about it.

Have you tried using an ObjectCache and pre-jitting [I still have a hard time using that term with a straight face] the runtime module? My runtime isn’t that large (about 4kloc), but the numbers I’m getting are that it takes about 2ms for the getLazyBitcodeModule call, and about 4ms to load the stdlib through the ObjectCache. I’m not sure how these numbers scale with the size of the runtime, but it feels like if the ObjectCache loading is too expensive then loading the bitcode might be as well? Another idea is that you could load+jit the bitcode the first time that you want to inline something, since the inlining+subsequent optimizations you probably want to do are themselves expensive and could mask the jit’ing time.

Anyway, my current plan is to stick with jit’ing the runtime module but cut down the amount of stuff included in it, since I’m finding that most of my runtime methods end up dispatching on type, and patchpoint-ing at runtime seems to be more effective than inlining aot.

Kevin

Hi Kevin,

I haven't tested yet ObjectCache, but I faced exactly the same issue
with hidden symbols :slight_smile: As a solution, I run a small module pass on
each runtime module (aka, .bc file), which modifies the linkages. I
run the pass before compiling bc files into .o. I have thus these
rules in my compilation process:

file.cc --> file-raw.bc --> file.bc --> file.o

file-raw.bc: file.cc => clang++ -emit-llvm
file.bc: file-raw.bc => opt with my pass
file.o: file.bc => llc

For hidden functions, it's easy : I replace linkonce_odr functions by
weak_odr functions. The semantic is exactly the same except that the
symbol is visible with dlsym in the resulting binary. For strings,
it's a little bit more complicated because you can have collisions
between names in different modules. So, I rename the strings in my
pass in order to ensure that the name is unique, and I replace the
InternalLinkage with an ExternalLinkage. It's far from perfect because
it slows down dlsym (the time to find a symbol is proportional to the
number of external symbols).

If you need the code of the pass, you can find it in my branch of vmkit:
http://llvm.org/svn/llvm-project/vmkit/branches/mcjit
in lib/vmkit-prepare-code/adapt-linkage.cc

Otherwise, I made a mistake in my previous mail: we can not use the
llvm::InlineCostAnalysis as is (and thus, we can not reuse the
heuristics able to compute the cost of inlining). The inline cost
analyzer has to explore the whole call flow graph and it's not so easy
when functions are defined in multiple modules (and I don't want to
explore the whole graph for each JITted function!). So, for the
moment, I only inline functions marked as AlwaysInline. I don't know
what I will do for this problem...

Gaël

Oh that’s a good point, making any changes in bitcode is a lot easier than once it’s gone down to elf.

Taking a brief look at InlineCost.cpp, it doesn’t seem like InlineCostAnalysis is actually using anything about the callgraph. The only thing it needs is a TargetTransformInfo, which it gets from runOnSCC(); it seems to actually work ok for me to hackily just put it into a separate PassManager and run it on an empty module, which initializes the local state appropriately.

Great, I will try that :slight_smile:
Gaël