LLVM languages cooperation

Hi all,

I am an LLVM newbie, thinking about using LLVM as the platform for a toy language.

In this respect, I was wondering if LLVM could be used to easily weave code written in different language. For instance, let's assume I have a library written in C, some components written in C++ and some components written in OCaml (we also assume an OCaml backend for LLVM). All this code gets compiled to LLVM bytecode.

Is there a way, given this bytecode, to access functions and invoke them uniformly ?

This may sound rather fuzzy, but thinking of .NET could help : .NET allows to write code in any supported language, the objects and functions written in these language become available to all supported languages. This is made possible because there is a common object model build on top of the .NET VM and bridged to all supported languages. I know LLVM does not define an object model, but maybe a common function call model ?

TIA,

-- Sébastien

Sebastian,

LLVM does define a common function call and struct/array/pointer model. You
could use that to define a language-interoperability scheme, but I think it
would require some special support from the front-ends to translate
functions exposed to a different language to conform with the scheme, or to
generate wrappers for them. For example, calling a Fortran function
(call-by-reference) from C (call-by-value) would not work automatically --
something has to generate wrappers to allow the call. The scheme would have
to define an object model and exceptions model.

FYI, a student in my research group is working on an OCaml-to-LLVM
front-end, using the OCaml compilers from Inria. Another student is
starting on an MSIL-to-LLVM translator and porting a CLI VM.

--Vikram

Another way of putting this is that LLVM _allows_ the code to
interoperate, but, like a microprocessor, does not establish any
conventions that makes interoperability happen automatically. In Vikram's
example, if you are interfacing C to a language with pass by reference
parameters, either the programmer can be required to prototype those
parameters as taking "a pointer to" the argument type (making the
'reference' explicit in C), or the compiler could generate the interface
code automatically, given a declaration like 'extern "Fortran"' that C++
provides. Also as Vikram mentioned, the exception model and runtime model
interactions would have to be specified.

MSIL in contrast, requires languages to match a particular object model
and runtime system. This is great for interoperability, as it almost
happens for free, but at a potential cost in expressibility (e.g., things
get messy supporting multiple inheritence). This really stems from
differing goals, though LLVM can certainly be used to compile MSIL-like
systems, so it's just a matter of specifying the rules and making the
front-ends stick to them. :slight_smile:

-Chris

Hello Chris,

Chris Lattner wrote:

This really stems from
differing goals, though LLVM can certainly be used to compile MSIL-like
systems, so it's just a matter of specifying the rules and making the
front-ends stick to them. :slight_smile:

Ok, this is clear. Do you plan any "foreign function interface" or something similar that would ease, if not automate communication across languages that have an LLVM backend ? It may look slightly out of the scope of the LLVM projet, but wouldn't it be interesting to provide some guidelines for LLVM backend developers to allow inter-language communication ?

-- Sébastien

I hadn't planned on it, but I can certainly describe what the C/C++
front-end emits. Basically, the rules are as follows:

1. C++ references are turned into pointers
2. If a function returns an aggregate value (ie, a structure or array), it
   is changed to return void, and an extra (first) argument is added that
   is a pointer to the aggregate type. The callee then fills in the
   aggregate argument instead of returning the aggregate.
3. If structures are passed by value, they are decimated, and all elements
   of the structure are passed as scalars.
4. Otherwise, all arguments are passed left-to-right in a very
   straight-forward fashion.

I think that these few rules represent the major non-obvious portions of
the interface that the C/C++ front-end generates. If you have any
questions about other corner cases, I can certainly answer them, or you
can try out examples with the front-end.

-Chris