Hi all,
I'm currently working on a C-like scripting language compiler backend designed to emit LLVM code. This code will be loaded into LLVM's JIT at runtime and will make calls into a C++ library (including calling virtual methods on C++ objects). The translation from our AST to an llvm::Module is fairly straightforward, the difficulty, however, comes in generating the appropriate LLVM code to call into the C++ code. We see two main issues, name mangling and virtual functions.
For example, lets say this scripting language contains a function foo:
void foo()
{
int x = builtin_func();
}
We would like for builtin_func() to generate code that calls a C++ function. One option we see for implementing this is to write C++ code that contains stub functions, such as:
extern "C"
int call_builtin_func()
{
return SomeCXXFunction();
}
or for something with method calls:
extern "C"
void call_func(Object* obj, ArgType* arg)
{
obj->func(arg); // func is a virtual method
}
and then compile this file with llvm-g++ to bitcode. At runtime, the JIT would load both our scripting code and this stub bitcode, and link everything together with the LTO. This hopefully alleviates the need to manually tweak name mangled symbols when generating the LLVM code for the scripting language and will automatically generate llvm type information for the C++ classes and types that live in the library. There will no doubt also be some build system trickery needed to glue everything together with appropriate symbol names as well.
Other than name-mangling, calling virtual functions presents a problem. It would seem that llvm-g++ is required to generate the vtable layouts for C++ classes so our generated LLVM code can grab the appropriate function pointer to make a call. But what is the right way to patch everything together so that our compiler can output LLVM code that can call these virtual C++ methods?
So my question is whether there exists a conventional technique for interfacing custom generated (non llvm-g++) LLVM code with C++ code? And as a side question, how do the target triple and data layout of the module play into this? I assume they must match the C++ ABI definitions. llvm-config can get the target-triple, but what about the native C++ data layout? It seems like this use case would come up often, but I haven't been able to find any discussion or documentation related to it.
Thanks!
Austin