Newbie JITter

Hi,
    I'm experimenting with using LLVM to generate dynamic FFI bridges in VisualWorks Smalltalk. LLVM is an amazing thing! I'm going from dynamically generated assembler source to machine code, and I have that all working, copied from the llc tool and the JIT example. I have two questions:

1. What optimization passes, if any, should I run on the module before I pass it to the ExecutionEngine.
2. Do I need to retain the Module/ExistingModuleProvider, once I've built the ExecutionEngine and have a Function object. Or is there some simpler way to store/call the native code block?

An unrelated question: I want to dump the native assembler form of the code generated by the JIT. Currently I dump it using some code I adapted from llc, but that's not actually showing what the JIT is generating. How can I dump the JIT generated code, as native assembler? It's a debugging requirement only.

My test code is as follows, much of which is to do with generating the native assembler output.

#include "llvm/Module.h"
#include "llvm/Assembly/Parser.h"
#include "llvm/Analysis/Verifier.h"
#include "llvm/ModuleProvider.h"
#include "llvm/ExecutionEngine/JIT.h"
#include "llvm/System/Signals.h"
#include "llvm/ExecutionEngine/GenericValue.h"
#include "llvm/PassManager.h"
#include "llvm/CodeGen/Passes.h"
#include "llvm/Target/TargetData.h"
#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetMachineRegistry.h"
#include <iostream>
using namespace llvm;

int main() {
  sys::PrintStackTraceOnErrorSignal();
  
  const char* assembler =
"@.LC0 = internal constant [19 x i8] c\"cooking with gas!\\0A\\00\" ; [11 x i8]*\n"
"\n"
"declare i32 @puts(i8 *) ; i32(i8 *)* \n"
"\n"
"define i32 @main() { ; i32()* \n"
" ; Convert [19 x i8 ]* to i8 *...\n"
" %cast210 = getelementptr [19 x i8]* @.LC0, i64 0, i64 0 ; i8 *\n"
" call i32 @puts(i8 * %cast210) ; i32\n"
" ret i32 0\n"
"};\n";
  
  int exitCode = 0;
  try {
    ParseError ParseErr;
    Module* M = ParseAssemblyString(assembler, new Module("test"), &ParseErr);
    if (!M) {
      cerr << "parse error: " << ParseErr.getMessage() << "\n";
      return 1;
    }
    
    std::cout << "\nWe just constructed this LLVM module:\n\n" << *M;
    
    std::string MArchErr;
    const TargetMachineRegistry::entry* MArch = TargetMachineRegistry::getClosestStaticTargetForModule(*M, MArchErr);
    if (MArch == 0) {
      std::cerr << "error auto-selecting target for module '" << MArchErr << "'.\n";
      return 1;
    }
    
    TargetMachine* target = MArch->CtorFn(*M, "");
    assert(target && "Could not allocate target machine!");
    
    FunctionPassManager Passes(new ExistingModuleProvider(M));
    Passes.add(new TargetData(*target->getTargetData()));
    
    Passes.add(createVerifierPass());
    
    switch (target->addPassesToEmitFile(Passes, std::cout, TargetMachine::AssemblyFile, false)) {
      default:
        assert(0 && "Invalid file model!");
        return 1;
      case FileModel::MachOFile:
      case FileModel::ElfFile:
      case FileModel::Error:
        std::cerr << "target does not support generation of this file type!\n";
        return 1;
      case FileModel::AsmFile:
        break;
    }
    
    MachineCodeEmitter *MCE = 0;
    if (target->addPassesToEmitFileFinish(Passes, MCE, false)) {
      std::cerr << "target does not support generation of this file type!\n";
      return 1;
    }
    
    std::cout << "\nWhich has this machine code form:";
    
    Passes.doInitialization();
    
    for (Module::iterator I = M->begin(), E = M->end(); I != E; ++I)
      if (!I->isDeclaration())
        Passes.run(*I);
    
    Passes.doFinalization();
    
    ExistingModuleProvider* MP = new ExistingModuleProvider(M);
    ExecutionEngine* EE = ExecutionEngine::create(MP, false);
    
    Function *MainFunction = M->getFunction("main");
    if (!MainFunction) {
      std::cerr << "'main' function not found in module.\n";
      return -1;
    }
    
    std::cout << "\n\nRunning main: " << std::flush;
    
    // Call the function with no arguments:
    std::vector<GenericValue> noargs;
    GenericValue gv = EE->runFunction(MainFunction, noargs);
    
    // Import result of execution:
    std::cout << "Result: " << gv.IntVal.toStringUnsigned(10) << "\n";
    
  } catch (const std::string& msg) {
    cerr << "exception: " << msg << "\n";
    exitCode = 1;
  } catch (...) {
    cerr << "exception: Unexpected unknown exception occurred.\n";
    exitCode = 1;
  }
  
  return exitCode;
}

Thanks,

Antony Blakey

Hi,
    I'm experimenting with using LLVM to generate dynamic FFI bridges
in VisualWorks Smalltalk. LLVM is an amazing thing! I'm going from
dynamically generated assembler source to machine code, and I have
that all working, copied from the llc tool and the JIT example. I
have two questions:

1. What optimization passes, if any, should I run on the module
before I pass it to the ExecutionEngine.

The default JIt driver, lli, runs everything. Code generation are almost the same whether you do dynamic compilation or static compilation (except for relocation-model / code-model used). You can pick and choose what passes to run if compile time is a concern. You can use

2. Do I need to retain the Module/ExistingModuleProvider, once I've
built the ExecutionEngine and have a Function object. Or is there
some simpler way to store/call the native code block?

I don't think it's safe to free Module early if you are using lazy compilation. If that's disabled, I suppose it's safe to delete Module once all references are resolved and relocated. Chris, can you confirm?

Don't understand your last question.

An unrelated question: I want to dump the native assembler form of
the code generated by the JIT. Currently I dump it using some code I
adapted from llc, but that's not actually showing what the JIT is
generating. How can I dump the JIT generated code, as native
assembler? It's a debugging requirement only.

I think there is disassembly capability built-in. See JITEmitter.cpp

Evan

Thanks Evan.

1. What optimization passes, if any, should I run on the module
before I pass it to the ExecutionEngine.

The default JIt driver, lli, runs everything.

My reading of the lli source indicates that it's not explicitly doing any opt passes - is that happening implicitly in the ExecutionEngine? I can see that I should probably copy the opt source to get all of the optimisations, before passing the module to the ExecutionEngine. It's difficult to know because I currently can't get a disassembly of the JITed code.

I don't think it's safe to free Module early if you are using lazy
compilation. If that's disabled, I suppose it's safe to delete Module
once all references are resolved and relocated. Chris, can you confirm?

Don't understand your last question.

I solved this - I hadn't noticed the comments about ownership transfers in the documentation. Also I didn't realize that I should have a single long-lived ExecutionEngine and then dynamically add my modules to it, and I didn't understand the significance of ExecutionEngine::getPointerToFunction, which solved my last question.

An unrelated question: I want to dump the native assembler form of
the code generated by the JIT. Currently I dump it using some code I
adapted from llc, but that's not actually showing what the JIT is
generating. How can I dump the JIT generated code, as native
assembler? It's a debugging requirement only.

I think there is disassembly capability built-in. See JITEmitter.cpp

That's only for x86, and only if you have udis86. I was using (indirectly) an AsmEmitter that is added by TargetMachine::addPassesToEmitFile, and now that I have a better understanding of what's going on, I realise there isn't really any facility to do what I need.

Antony Blakey

Thanks Evan.

1. What optimization passes, if any, should I run on the module
before I pass it to the ExecutionEngine.

The default JIt driver, lli, runs everything.

My reading of the lli source indicates that it's not explicitly doing any opt passes - is that happening implicitly in the ExecutionEngine? I can see that I should probably copy the opt source to get all of the optimisations, before passing the module to the ExecutionEngine. It's difficult to know because I currently can't get a disassembly of the JITed code.

Right. Optimization is done on llvm bitcode and is separate from the codegen process.

I don't think it's safe to free Module early if you are using lazy
compilation. If that's disabled, I suppose it's safe to delete Module
once all references are resolved and relocated. Chris, can you confirm?

Don't understand your last question.

I solved this - I hadn't noticed the comments about ownership transfers in the documentation. Also I didn't realize that I should have a single long-lived ExecutionEngine and then dynamically add my modules to it, and I didn't understand the significance of ExecutionEngine::getPointerToFunction, which solved my last question.

An unrelated question: I want to dump the native assembler form of
the code generated by the JIT. Currently I dump it using some code I
adapted from llc, but that's not actually showing what the JIT is
generating. How can I dump the JIT generated code, as native
assembler? It's a debugging requirement only.

I think there is disassembly capability built-in. See JITEmitter.cpp

That's only for x86, and only if you have udis86. I was using (indirectly) an AsmEmitter that is added by TargetMachine::addPassesToEmitFile, and now that I have a better understanding of what's going on, I realise there isn't really any facility to do what I need.

There is nothing readily available but it's not terrible hard to get IMO. Take a look at LLVMTargetMachine.cpp. addPassesToEmitFile is the normal static compilation path while addPassesToEmitMachineCode is the JIT path. So it's possible for you to add addAssemblyEmitter to print out the assembly before machine code emitter pass.

Evan

Antony,

That's only for x86, and only if you have udis86. I was using
(indirectly) an AsmEmitter that is added by
TargetMachine::addPassesToEmitFile, and now that I have a better
understanding of what's going on, I realise there isn't really any
facility to do what I need.

There is special command line switch in lli called -print-machineinstrs.