Hi Basile,
Great to see you here as well! 
The OCaml developers are becoming increasing upset with the OCaml community
picking holes in their implementation so I'd rather keep this discussion off
the caml-list. I have quite strong personal views on this, of course, and
would love to discuss them but not here.
I believe this will be of wider interest so I think it is ideal for the LLVM
list.
As some might probably know, the LLVM compiler http://llvm.org/ has (at
least in its latest SVN snapshot) a binding for Ocaml. This means that
one could code in Ocaml some stuff (eg a JIT-ing compiler) which uses
(and links with) LLVM libraries.
http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-November/011481.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-November/011507.html
AFAIK, the current OCaml bindings do not yet support JIT but you can easily
write a static compiler using them. As I'm sure you noticed, I've been
working on this for a couple of days and the results are already incredibly
impressive!
However, to generate code with LLVM for Ocamlopt, this is not enough,
since while LLVM does have hooks to support garbage collection
Garbage Collection with LLVM — LLVM 16.0.0git documentation
I don't know of any actual hooks to fit into the needs of Ocamlopt
garbage colector (which AFAIK require some specific frame descriptors in
the code, in some hashtables, which details are tricky and known to very
few implementors, perhaps only Xavier Leroy & Damien Doligez).
So is there any code to fit the Ocaml GC requirements into LLVM
abilities, ie to use LLVM to generate (eg JIT) code which respect Ocaml
GC requirements.
Of course, I do know that there are some typing issues and theoritical
points which I deliberately ignore here. I'm supposing the guy wanting
to LLVM for Ocaml is knowing that he seeks trouble.
And Metaocaml is (unfortunately) nearly dead: future (in ocaml 3.11 or
3.12) dynamic libraries ability is not a full replacement! Even if one
might generate Ocaml code and compile & dlopen it in a future version of
Ocaml.
OCaml's GC has many wonderful properties. However, it also has some
disadvantages:
. Strings and arrays of any type are limited to only 16Mb on 32-bit platforms.
. Integers are limited to 31- or 63-bits, or much slower boxed
machine-precision integers, making it difficult to write efficient bitwise
functions.
. Only certain types are unboxed (float arrays and all-float records but not
char arrays, all-float tuples or complex arrays), e.g. I must manually unbox
complex numbers in arrays to work around the ~5x performance hit that this
causes in FFT routines.
. Insufficient run-time type information to provide safe marshalling and
introspection.
. Single threaded.
. Upstream is controlled by INRIA and cannot be contributed to by the
community.
. Restrictive license.
. Undocumented.
. Very complicated => unmaintainable according to the maintainers.
. Apparently LLVM cannot generate exceptions compatible with OCaml's run-time.
I want to build a better future for the OCaml community but without the
requirement to adopt OCaml's baggage: wherever OCaml might be improved upon,
I am interested in doing so. If you wish to remain run-time compatible then
reusing OCaml's existing run-time is an obvious choice. However, I think
there is a lot to be gained by not reusing it. In this context, LLVM already
offers alternatives for things like exception handling.
My experience of this stems largely from using MLton and F#. The run-time
affects the performance of heavily-allocating code, which means symbolic code
and not numerical code but MLton is several times faster than OCaml for
symbolic code and F# can be several times faster than OCaml for numerical
code. So I think there is a lot of merit in keeping the practically-useful
and now very popular OCaml language (i.e. make a compatible front-end) but
drawing upon the designs of MLton and F# rather than OCaml.
MLton uses whole-program optimizations to provide elegant abstractions with no
run-time overhead and F# leverages arbitrary unboxing and the CLR code
generator to obtain excellent performance on numerical computations.
I would like to work towards these goals incrementally but I would like to
create something of practical value sooner rather than later and start
garnering a userbase.
Finally, I see no reason why the resulting run-time shouldn't be of wider
interest to anyone wanting to implement a compiler for a functional
programming language. Objectively, I think most people would much rather have
something slower but documented.