Ocaml(opt) & llvm

Hello All (to Ocaml List & CC LLVM list)

As some might probably know, the LLVM compiler http://llvm.org/ has (at least in its latest SVN snapshot) a binding for Ocaml. This means that one could code in Ocaml some stuff (eg a JIT-ing compiler) which uses (and links with) LLVM libraries.
http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-November/011481.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-November/011507.html

However, to generate code with LLVM for Ocamlopt, this is not enough, since while LLVM does have hooks to support garbage collection
http://www.llvm.org/docs/GarbageCollection.html
I don't know of any actual hooks to fit into the needs of Ocamlopt garbage colector (which AFAIK require some specific frame descriptors in the code, in some hashtables, which details are tricky and known to very few implementors, perhaps only Xavier Leroy & Damien Doligez).

So is there any code to fit the Ocaml GC requirements into LLVM abilities, ie to use LLVM to generate (eg JIT) code which respect Ocaml GC requirements.

Of course, I do know that there are some typing issues and theoritical points which I deliberately ignore here. I'm supposing the guy wanting to LLVM for Ocaml is knowing that he seeks trouble.

And Metaocaml is (unfortunately) nearly dead: future (in ocaml 3.11 or 3.12) dynamic libraries ability is not a full replacement! Even if one might generate Ocaml code and compile & dlopen it in a future version of Ocaml.

Thanks for reading.

As some might probably know, the LLVM compiler http://llvm.org/ has (at least in its latest SVN snapshot) a binding for Ocaml. This means that one could code in Ocaml some stuff (eg a JIT-ing compiler) which uses (and links with) LLVM libraries.

Yep! There are no bindings for the JIT (just for codegen), but it has a fairly narrow interface, so adding them would be fairly straightforward.

So is there any code to fit the Ocaml GC requirements into LLVM abilities, ie to use LLVM to generate (eg JIT) code which respect Ocaml GC requirements.

Included in the pending LLVM garbage collection code generation changeset is an Ocaml frametable emitter. See here:

     http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-November/011473.html

In a JIT context--which I am not presently pursing--it would also be possible to generate these tables in memory and spool the records into an active runtime. This is precisely what the ocaml startup code does and it would be straightforward to invoke that machinery at runtime.

It might be exciting to have an Ocaml with "exec" (surely it would allow new classes of programs), but static compilation seems clearly superior for existing programs, so my focus is there for now.

Of course, I do know that there are some typing issues and theoritical points which I deliberately ignore here. I'm supposing the guy wanting to LLVM for Ocaml is knowing that he seeks trouble.

The ocaml type system is easily represented in LLVM. The only real mismatches I'm aware of are:

  1. The ocaml exception model is quite unique; emulating it seems unlikely. DWARF exceptions are a suitable but incompatible replacement.
  2. ocaml's placement of symbols in global variables, described here <ocamlopt places symbols interior to statically allocated 'value's · Issue #4414 · ocaml/ocaml · GitHub >.

The latter was easily resolved on the ocaml side, but with a trivially incompatible change. Perhaps it will be incorporated for 3.11.

— Gordon

It might be exciting to have an Ocaml with "exec" (surely it would
allow new classes of programs), but static compilation seems clearly
superior for existing programs, so my focus is there for now.

There are various different approaches to this, of course, but having tried
the Lisp and MetaOCaml approaches I think the best way is probably to exploit
LLVM's JIT facilities directly. The problem is simply that any abstractions
impose significant performance overheads that are large enough to undermine
the point of having an "exec" in the first place.

So I would not recommend putting too much effort into an "exec" for OCaml. I
think it would be more productive to port OCaml's regular expression engine
to use LLVM's JIT facilities directly, for example.

  1. The ocaml exception model is quite unique; emulating it seems
unlikely. DWARF exceptions are a suitable but incompatible replacement.

Do you mean that LLVM cannot generate exception constructs in the format
reqired by OCaml's run-time?

That's correct.

— Gordon

Hi Basile,

Great to see you here as well! :slight_smile:

The OCaml developers are becoming increasing upset with the OCaml community
picking holes in their implementation so I'd rather keep this discussion off
the caml-list. I have quite strong personal views on this, of course, and
would love to discuss them but not here.

I believe this will be of wider interest so I think it is ideal for the LLVM
list.

As some might probably know, the LLVM compiler http://llvm.org/ has (at
least in its latest SVN snapshot) a binding for Ocaml. This means that
one could code in Ocaml some stuff (eg a JIT-ing compiler) which uses
(and links with) LLVM libraries.
http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-November/011481.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-November/011507.html

AFAIK, the current OCaml bindings do not yet support JIT but you can easily
write a static compiler using them. As I'm sure you noticed, I've been
working on this for a couple of days and the results are already incredibly
impressive!

However, to generate code with LLVM for Ocamlopt, this is not enough,
since while LLVM does have hooks to support garbage collection
Garbage Collection with LLVM — LLVM 16.0.0git documentation
I don't know of any actual hooks to fit into the needs of Ocamlopt
garbage colector (which AFAIK require some specific frame descriptors in
the code, in some hashtables, which details are tricky and known to very
few implementors, perhaps only Xavier Leroy & Damien Doligez).

So is there any code to fit the Ocaml GC requirements into LLVM
abilities, ie to use LLVM to generate (eg JIT) code which respect Ocaml
GC requirements.

Of course, I do know that there are some typing issues and theoritical
points which I deliberately ignore here. I'm supposing the guy wanting
to LLVM for Ocaml is knowing that he seeks trouble.

And Metaocaml is (unfortunately) nearly dead: future (in ocaml 3.11 or
3.12) dynamic libraries ability is not a full replacement! Even if one
might generate Ocaml code and compile & dlopen it in a future version of
Ocaml.

OCaml's GC has many wonderful properties. However, it also has some
disadvantages:

. Strings and arrays of any type are limited to only 16Mb on 32-bit platforms.

. Integers are limited to 31- or 63-bits, or much slower boxed
machine-precision integers, making it difficult to write efficient bitwise
functions.

. Only certain types are unboxed (float arrays and all-float records but not
char arrays, all-float tuples or complex arrays), e.g. I must manually unbox
complex numbers in arrays to work around the ~5x performance hit that this
causes in FFT routines.

. Insufficient run-time type information to provide safe marshalling and
introspection.

. Single threaded.

. Upstream is controlled by INRIA and cannot be contributed to by the
community.

. Restrictive license.

. Undocumented.

. Very complicated => unmaintainable according to the maintainers.

. Apparently LLVM cannot generate exceptions compatible with OCaml's run-time.

I want to build a better future for the OCaml community but without the
requirement to adopt OCaml's baggage: wherever OCaml might be improved upon,
I am interested in doing so. If you wish to remain run-time compatible then
reusing OCaml's existing run-time is an obvious choice. However, I think
there is a lot to be gained by not reusing it. In this context, LLVM already
offers alternatives for things like exception handling.

My experience of this stems largely from using MLton and F#. The run-time
affects the performance of heavily-allocating code, which means symbolic code
and not numerical code but MLton is several times faster than OCaml for
symbolic code and F# can be several times faster than OCaml for numerical
code. So I think there is a lot of merit in keeping the practically-useful
and now very popular OCaml language (i.e. make a compatible front-end) but
drawing upon the designs of MLton and F# rather than OCaml.

MLton uses whole-program optimizations to provide elegant abstractions with no
run-time overhead and F# leverages arbitrary unboxing and the CLR code
generator to obtain excellent performance on numerical computations.

I would like to work towards these goals incrementally but I would like to
create something of practical value sooner rather than later and start
garnering a userbase.

Finally, I see no reason why the resulting run-time shouldn't be of wider
interest to anyone wanting to implement a compiler for a functional
programming language. Objectively, I think most people would much rather have
something slower but documented.

Can you explain how the ocaml exception model works?

-Chris

  1. The ocaml exception model is quite unique; emulating it seems unlikely. DWARF exceptions are a suitable but incompatible replacement.

Can you explain how the ocaml exception model works?

Sure. With ocamlopt’s model, try has cost only slightly higher than a function call and raise has cost only slightly higher than return. I’m not concerned with the bytecode interpreter and compiler, ocamlc and ocamlrun.

As Daniel Berlin pointed out on IRC, the language model is trivial. It has just three exception-handling primitives:

matching ::= patternexpr ( ‘|patternexpr )*

raise expr
try expr with matching
exception id ( tuple-type-expr )?

exception declares an exception type. For the purposes of this discussion, exception values are just heap pointers. raise and with only accept expressions of an exception type.

raise unwinds the stack, submitting expr to the first matching pattern in the nearest enclosing try-with. The runtime provides a catch-all at startup which terminates the program. The matching expression is sufficient to implement finally (x → expr) and catch-all (x → expr; raise x). Each finally expression requires its own try-with expression.

The codegen for raise is simple. It just reads a saved return address from the caml_exception_pointer global and returns through several stack frames in one go. The expression raise expr is compiled as such:

; Store the exception value in the return register.

$r1 = …

; Change the stack pointer.

load $sp ← 0(caml_exception_pointer)

; The stack now contains:

; … rest of stack …
; return address
; old caml_exception_pointer
; Restore caml_exception_pointer.
pop 0(caml_exception_pointer)

; Jump straight to the nearest landing pad.

ret

The try-with expression is where the trickery lies. The expression try body with pattern1catch1 | patternNcatchN is compiled to:

; ‘Call’ a label within the function.

call try_block

; This is the landing pad. The exception is in the return register.

; Try to match a handler.

$r2 = $1 matches pattern1

branch to handler1_block if $r2

$r2 = $1 matches patternN

branch to handlerN_block if $r2

; If no handler matched, re-raise the exception.

load $sp ← 0(caml_exception_pointer)

pop 0(caml_exception_pointer)

ret

handler1_block:

$r1 = catch1

jump to after_block

handlerN_block:

$r1 = catchN
jump to after_block

try_block:

; Save caml_exception_pointer on the stack.

push 0(caml_exception_pointer)

; The top of the stack is now what raise expects:

; … rest of stack …

; return address

; old caml_exception_pointer

; Save $sp into caml_exception_poointer.

store $sp → 0(caml_exception_pointer)

; Generate try body here.

$r1 = body ; n.b.: use increased stack offsets to access locals.

; Restore the saved value of caml_exception_pointer.

pop 0(caml_exception_pointer)

; Resume to the normal flow of execution by undoing the stack

; adjustments made by call and falling through. A normal ‘ret’

; would jump into the exception handler, which we don’t want!

pop

after_block:

; Either the exception was handled, or none was raised.

— Gordon

As Daniel Berlin pointed out on IRC, the language model is trivial. It has just three exception-handling primitives:
raise expr
try expr with matching
exception id ( tuple-type-expr )?

ok.

The codegen for raise is simple. It just reads a saved return address from the caml_exception_pointer global and returns through several stack frames in one go. The expression raise expr is compiled as such:

; Store the exception value in the return register.
$r1 = ...

; Change the stack pointer.
load $sp <- 0(caml_exception_pointer)

; The stack now contains:
; ... rest of stack ...
; return address
; old caml_exception_pointer
; Restore caml_exception_pointer.
pop 0(caml_exception_pointer)

; Jump straight to the nearest landing pad.
ret

Nice.

The try-with expression is where the trickery lies. The expression try body with pattern1 -> catch1 | patternN -> catchN is compiled to:

This seems pretty straight-forward to normal landing pads. The only significant difference is how it dynamically registers the callback and how it continues unwinding.

If you were interested in adding this to llvm as some new EH intrinsics, I wouldn't be opposed to that. It's basically a form of sjlj exceptions. It only works if values are not held in registers across throws though, which is kinda lame ...

-Chris

The codegen for raise is simple. It just reads a saved return address from the caml_exception_pointer global and returns through several stack frames in one go.

Nice.

Yup.

The try-with expression is where the trickery lies. The expression try body with pattern1 -> catch1 | patternN -> catchN is compiled to:

This seems pretty straight-forward to normal landing pads.

Certainly the primitives are more or less identical to LLVM's. There is a strict stack of handlers, though, and I'm not sure how easy that is to reconstruct.

The only significant difference is how it dynamically registers the callback and how it continues unwinding.

I thought the implications with respect to frame object displacements seemed rather invasive. Within the try block, frame offsets are progressively increased with each level of nesting.

If you were interested in adding this to llvm as some new EH intrinsics, I wouldn't be opposed to that.

Okay. I imagine the SJLJ EH would be a good place to start poking around if I were to give it a shot?

It's basically a form of sjlj exceptions.

Right.

It only works if values are not held in registers across throws though, which is kinda lame ...

Though I'm primarily interested in this model only from an interoperability perspective, reloading the register file for a throw seems a comparatively small price to pay compared to, say, symbolically unwinding the stack. :slight_smile: More importantly, the common case through code does not require a register file save/restore.

— Gordon

The issue is in the non-throw case. Consider a function like this:

  int x = ...
  try {
    x++;
    foo();

    use (x);

  } catch (...) {
    print x;
  }

Because the 'throw' doesn't restore the callee-save registers as the stack is unwound, the compiler can't put X in a register across the x++ and use of x in the try block.

-Chris

Okay; didn't see that interaction. I'll scrutinize the codegen more closely for this stuff before proceeding. I didn't apply any register pressure in my example cases.

— Gordon

Actually, this is difficult to reproduce in Ocaml codes because Ocaml code is in a restricted SSA form:

(1) Mutable variables exist only in the heap: let x = 43 ref is literally shorthand for let x = { mutable contents = 43 }, which allocates a single-cell object on the heap. Arrays and strings (also imperative) are also restricted to the heap.
(2) Loop index variables from the for expression cannot escape.

Still, it could be significant after LLVM transformations.

-- Gordon

Gordon Henriksen wrote:

(1) Mutable variables exist only in the heap: let x = 43 ref is literally shorthand for let x = { mutable contents = 43 }, which allocates a single-cell object on the heap.

This is not always true. When the reference is only dereferenced or
assigned locally, it is treated as a local mutable variable. Example:

let f () =
   let x = ref 0 in
   for i = 1 to 10 do incr x done;
   !x

Of course, if the reference is passed to another function, it will
indeed be heap allocated.

-- Alain