C Compiler written in OCaml, Pointers Wanted

Hello,

For a course project, I am required to write a compiler for some language of
my choice, and this compiler has to be implemented in a functional language.
I have chosen create a *JIT* compiler for C source, and to implement my
compiler in OCaml using LLVM for the back-end. I have experience using LLVM
in C++ (I wrote a MATLAB JIT compiler not long ago), however, I am a bit
puzzled as to how to go about some things, and would appreciate some
pointers:

1. When writing my MATLAB JIT in C++, I created bindings to native C++
functions to serve as my runtime library. This hardly seems practical in
OCaml. I would ideally want to write my runtime library in C (this will
contain functions such as malloc, free, puts, strlen, etc.), and link it
with the code I compile somehow. Is there any way for LLVM to link with code
in pre-compiled C object files? Please note that this is for a JIT compiler,
I need to be able to do this at run-time, I will not be generating an
executable file.

2. One thing I don't know how to go about is memory allocation. As I just
said, this will be a C *JIT* compiler. This means that my running compiled
code will have to co-exist with OCaml. How do I go about implementing malloc
in this context? Does LLVM provide some memory allocation implementation
that will work with a JIT?

3. Do the OCaml LLVM bindings even allow using LLVM in JIT mode?

And of course, if anyone has experience writing a JIT using LLVM, or using
the OCaml LLVM bindings, any advice you may have will be greatly
appreciated.

Hello,

For a course project, I am required to write a compiler for some language of
my choice, and this compiler has to be implemented in a functional language.
I have chosen create a *JIT* compiler for C source, and to implement my
compiler in OCaml using LLVM for the back-end. I have experience using LLVM
in C++ (I wrote a MATLAB JIT compiler not long ago), however, I am a bit
puzzled as to how to go about some things, and would appreciate some
pointers:

Here:
0xdeadbeef
0x00007ffff7de9f2f

Couldn't resist. :smiley:

1. When writing my MATLAB JIT in C++, I created bindings to native C++
functions to serve as my runtime library. This hardly seems practical in
OCaml. I would ideally want to write my runtime library in C (this will
contain functions such as malloc, free, puts, strlen, etc.), and link it
with the code I compile somehow. Is there any way for LLVM to link with code
in pre-compiled C object files? Please note that this is for a JIT compiler,
I need to be able to do this at run-time, I will not be generating an
executable file.

Is writing the runtime part of the assignment? In any case, to get
yourself up and running, if you have declarations of the runtime
functions in your module with external linkage, the JIT will dlsym
them if it can't find a definition for them in your module. So if you
have malloc, strlen, etc. linked into your binary (which I'm guessing
you would, since I'm sure LLVM links in libc), it should be able to
call those.

2. One thing I don't know how to go about is memory allocation. As I just
said, this will be a C *JIT* compiler. This means that my running compiled
code will have to co-exist with OCaml. How do I go about implementing malloc
in this context? Does LLVM provide some memory allocation implementation
that will work with a JIT?

I would guess that since you can call LLVM from OCaml, you can use any
memory allocation strategy that you might normally use in a C/C++
program. ie, mmap some address space and go nuts. :slight_smile:

3. Do the OCaml LLVM bindings even allow using LLVM in JIT mode?

No idea.

Reid

Hello,

For a course project, I am required to write a compiler for some language of
my choice, and this compiler has to be implemented in a functional language.
I have chosen create a *JIT* compiler for C source, and to implement my
compiler in OCaml using LLVM for the back-end. I have experience using LLVM
in C++ (I wrote a MATLAB JIT compiler not long ago), however, I am a bit
puzzled as to how to go about some things, and would appreciate some
pointers:

1. When writing my MATLAB JIT in C++, I created bindings to native C++
functions to serve as my runtime library. This hardly seems practical in
OCaml. I would ideally want to write my runtime library in C (this will
contain functions such as malloc, free, puts, strlen, etc.), and link it
with the code I compile somehow. Is there any way for LLVM to link with code
in pre-compiled C object files?

I think so. You may need to modify LLVM's Makefile to do so.
Here is how Makefile is configured. http://llvm.org/docs/MakefileGuide.html

Please note that this is for a JIT compiler,
I need to be able to do this at run-time, I will not be generating an
executable file.

2. One thing I don't know how to go about is memory allocation. As I just
said, this will be a C *JIT* compiler. This means that my running compiled
code will have to co-exist with OCaml. How do I go about implementing malloc
in this context? Does LLVM provide some memory allocation implementation
that will work with a JIT?

3. Do the OCaml LLVM bindings even allow using LLVM in JIT mode?

I think LLVM OCaml bindings do not support JIT too much.
All the exposed C++ interfaces, which are possibly relative to JIT,
are /bindings/ocaml/executionengine/llvm_executionengine.ml.
But you could be able to expose more JIT interfaces into OCaml.

Is writing the runtime part of the assignment? In any case, to get

yourself up and running, if you have declarations of the runtime functions in your module with external linkage, the JIT will dlsym them if it can't find a definition for them in your module. So if you have malloc, strlen, etc. linked into your binary (which I'm guessing you would, since I'm sure LLVM links in libc), it should be able to call those.

I don't think I have to write the runtime myself. Whether or not the binary links with libc would depend on whether OCaml does or not though. Can LLVM load pre-compiled object files for use with a JIT is what I really want to know. Although, if there was a way to force it to use a specific .so file to resolve symbols, that would be good also.

>> I would guess that since you can call LLVM from OCaml, you can use any memory allocation strategy that you might normally use in a C/C++ program. ie, mmap some address space and go nuts. :slight_smile:

That kind of goes back to the previous point though... How do I resolve the mmap function? I'm guessing though, if I can actually resolve libc's malloc, that saves me the trouble of having to implement my own memory allocation scheme.

- Max

Hello,

For a course project, I am required to write a compiler for some language
of my choice, and this compiler has to be implemented in a functional
language. I have chosen create a *JIT* compiler for C source, and to
implement my compiler in OCaml using LLVM for the back-end. I have
experience using LLVM in C++ (I wrote a MATLAB JIT compiler not long ago),
however, I am a bit puzzled as to how to go about some things, and would
appreciate some pointers:

First up, my HLVM project should answer all of your questions:

  http://www.ffconsultancy.com/ocaml/hlvm/

Specifically, this is a JIT compiler written almost entirely in OCaml (2kLOC)
that includes a self-generating run-time with multicore-capable garbage
collector that operates using malloc and free directly.

1. When writing my MATLAB JIT in C++, I created bindings to native C++
functions to serve as my runtime library. This hardly seems practical in
OCaml.

You would expose a C interface and bind OCaml to that with C stubs using
OCaml's internal's C macros.

I would ideally want to write my runtime library in C (this will
contain functions such as malloc, free, puts, strlen, etc.),

You'll probably find it easier just to invoke these directly from your
generated code. In other words, your OCaml code will call LLVM to generate
native code and then call LLVM's JIT to executate that native code and then
your native code is on its own (no calling back to or via OCaml code).

and link it
with the code I compile somehow. Is there any way for LLVM to link with
code in pre-compiled C object files? Please note that this is for a JIT
compiler, I need to be able to do this at run-time, I will not be
generating an executable file.

You can call libc stuff directly. Other stuff you probably want to dlload.

2. One thing I don't know how to go about is memory allocation. As I just
said, this will be a C *JIT* compiler. This means that my running compiled
code will have to co-exist with OCaml. How do I go about implementing
malloc in this context? Does LLVM provide some memory allocation
implementation that will work with a JIT?

No need. You just call malloc and free from your generated code.

3. Do the OCaml LLVM bindings even allow using LLVM in JIT mode?

Yes, of course. This is a core part of several LLVM-based projects written in
OCaml.

And of course, if anyone has experience writing a JIT using LLVM, or using
the OCaml LLVM bindings, any advice you may have will be greatly
appreciated.

I have been using LLVM from OCaml for well over a year. It works really well:
LLVM is a great library and OCaml is an awesome language for compiler
writing.

In particular, you do *not* need to modify LLVM or OCaml at all. They both
work perfectly in harmony out of the box.

Can you elaborate on this?

Several major projects are using OCaml's LLVM bindings to execute non-trivial
code via JIT.

I think LLVM OCaml bindings do not support JIT too much.

Can you elaborate on this?

I meant the OCaml bindings let OCaml call existing C++ LLVM routines,
such as creating an execution engine, JIT-ing a function with existing JIT
or interpret or, and evaluating a function,
  as what http://llvm.org/docs/tutorial/OCamlLangImpl4.html shows.
But LLVM has not exposed the LLVM interfaces to design a new JIT
  like http://llvm.org/docs/WritingAnLLVMBackend.html#jitSupport.
I did not find such bindings from
bindings/ocaml/executionengine/llvm_executionengine.ml.
Please fix me if I am wrong.

Several major projects are using OCaml's LLVM bindings to execute non-trivial
code via JIT.

Could you please point out what these projects are? I am very interested in
looking into these projects to see if they exposed any more LLVM interfaces,
and how they did this.

OCaml bindings for optimizations have not exposed the LLVM interfaces to
let OCaml define a new optimization pass yet. I was planning to design
an OCaml LLVM pass, so it would help a lot to look at how JIT bindings are used.

Thanks a lot.

Jianzhou

>> I think LLVM OCaml bindings do not support JIT too much.
>
> Can you elaborate on this?

I meant the OCaml bindings let OCaml call existing C++ LLVM routines,
such as creating an execution engine, JIT-ing a function with existing JIT
or interpret or, and evaluating a function,
  as what http://llvm.org/docs/tutorial/OCamlLangImpl4.html shows.
But LLVM has not exposed the LLVM interfaces to design a new JIT
  like http://llvm.org/docs/WritingAnLLVMBackend.html#jitSupport.
I did not find such bindings from
bindings/ocaml/executionengine/llvm_executionengine.ml.
Please fix me if I am wrong.

Your statements are correct but, given that you can write a complete compiler
in OCaml using LLVM's JIT compilation, I think it is OTT to say that
the "OCaml bindings do not support JIT too much".

> Several major projects are using OCaml's LLVM bindings to execute
> non-trivial code via JIT.

Could you please point out what these projects are?

You'll have to ask Erick Tryzelaar, James Woodyatt and Nyx what they're up
to. :slight_smile:

I am very interested in looking into these projects to see if they exposed
any more LLVM interfaces, and how they did this.

I doubt they exposed any more of LLVM's internals.

OCaml bindings for optimizations have not exposed the LLVM interfaces to
let OCaml define a new optimization pass yet. I was planning to design
an OCaml LLVM pass, so it would help a lot to look at how JIT bindings are
used.

I would strongly advise against that. The impedance mismatch between OCaml and
C++ is so large that you will spend virtually all of your time addressing the
incidental complexity of writing and maintaining low-level bindings instead
of solving real problems. Moreover, the bindings will be so slow (due to
copying) and error-prone that you will have lost the core benefits of using
OCaml in the first place.

If you do decide to go this route you might consider writing very loose and
more language agnostic bindings, e.g. via XML-RPC rather than the ABIs.

I have a hobby project that I work on in whatever spare time I have left over from my boring day job. It's not an open source project.

You can see my previous OCaml weirdness at <http://bitbucket.org/jhw/>.