I'm having another play with LLVM using the OCaml bindings for a forthcoming
OCaml Journal article and I have a couple of remarks:
Firstly, I noticed that the execute engine is very slow, taking milliseconds to call a JIT compiled function. Is this an inherent overhead or am I calling it incorrectly or is this something that can be optimized in the OCaml bindings?
The high-level calling convention using GenericValue is going to be very slow relative to a native function call. This is true in C++, but even moreso in Ocaml, which must cons up a bunch of objects on the heap for each call. To get best performance, you would want to avoid fine-grained calls into JIT'd code, e.g. by iterating over inputs inside the JIT instead of outside.
If you want to improve performance of the GenericValue-based interface, I'd suggest trying to minimize the number and overhead of allocations in your Ocaml code, then look at the bindings themselves:
- If GenericValues can't be reused, add bindings to allow mutating them. Reuse the same 'n' instances for each call into JIT code. Yucky imperative data structures to the rescue.
- Write bindings for a heap-allocated GenericValue and wrap that in a custom block instead of heap-allocating each GenericValue individually. Of course such an array must be mutable. More imperative data structures!
- Try using placement new to initialize GenericValues inside of Ocaml blocks instead of new'ing them up on the C++ heap as is presently done. This would be outside the bounds of standard C++, so it could fail. This would require circumventing the C bindings, since such cannot expose the C++ GenericValue class as a struct.
- Use Ocaml variants for inputs (type GenericValue = Pointer of 'a | Int of bits * value | ...) and convert those to a stack-based SmallVector<GenericValue>. This will avoid finalizers on the Ocaml blocks. This doesn't work symmetrically for outputs, though. Likewise, it involves going around the C bindings.
But realize that a GenericValue-based interface will always be slow relative to a native call. If you have a specific performance goal though, you may be able to cheaply eliminate 'enough' overhead for your needs without much work. All of the above are relatively simple (should be doable in a day, modulo patch review).
For the very best performance, you really want to call the JIT'd function directly—e.g.,
let nf = native_function name m
where native_function has type string -> Llvm.module -> 'a and nf has some functional type, like int -> int -> int.
However, this is subject to the quirks and complexities of the Ocaml FFI (e.g., overflow arguments passed in a global array on x86, totally nonstandard calling convention).
- If you know in advance the signature of the functions you're going to call, you can write shims in C (similar to those in llvm_ocaml.c) that will add not terribly much overhead. These wouldn't really be of any use to anyone else, though.
- If not, you can generate the shims at runtime using LLVM (even inline them into the callee), but will have to reimplement Ocaml's FFI macros for unwrapping values and tracking stack roots. This would take considerably more effort to implement (esp. portably), but would be a substantial improvement to the bindings if the helpers were incorporated therein.
Secondly, I happened to notice that JIT compiled code executed on the fly does not read from the stdin of the host OCaml program although it can write to stdout. Is this a bug?
This has nothing to do with LLVM.
— Gordon