LLVM OCaml Tutorial

I’m currently going through the LLVM OCaml bindings tutorial in preparation for using LLVM in my own project. While the tutorial is very helpful, it was somewhat hard to start due to the fact that I plan on using ocamllex and ocmalyacc and the tutorial hand rolls their own lexer and parser.

I have managed to adapt almost all of the tutorial code into using ocamllex and ocamlyacc (the only thing I’m missing is the ability to assign precendence to user defined binary and unary operators), and was wondering if ther would be an interest in adding my code as supplimental material.

Sincerely,
Chris Wailes

Yes! I thought it was a bit of a shame that so much precious LLVM tutorial
space was devoted to lexing and parsing without using appropriate tools.

You may also want to write an even smaller parser using camlp4...

I would consider doing this, but one of the main reasons I couldn’t use the main tutorial was because the Fedora 10 camlp4 package was messed up and the pre processors weren’t linked correctly (I don’t think Fedora likes OCaml much). Anyway, I think it might be best to not include camlp4 macros in the parser, simply to reduce the number of tools that someone needs to know to understand the files. If they know camlp4 they can always add support for it in their own projects or as an exercise in understanding the tutorial code.

  • Chris Wailes

I'm currently going through the LLVM OCaml bindings tutorial in preparation
for using LLVM in my own project. While the tutorial is very helpful, it
was somewhat hard to start due to the fact that I plan on using ocamllex and
ocmalyacc and the tutorial hand rolls their own lexer and parser.

I have managed to adapt almost all of the tutorial code into using ocamllex
and ocamlyacc (the only thing I'm missing is the ability to assign
precendence to user defined binary and unary operators), and was wondering
if ther would be an interest in adding my code as supplimental material.

Great! I'd be happy to add this, though I think as as an addendum
instead of a replacement unless the community feels strongly about it.
Since most of the llvm documentation is for c++, I wanted the ocaml
developers to be able to read the ocaml tutorial then the c++ tutorial
and it be nearly one-to-one. I thought camlp4 did a much better job of
matching the c++ tutorial than ocamllex/ocamlyacc. That said, a lot of
people use them and it'd be handy to see how to use them with llvm as
well.

You may also want to write an even smaller parser using camlp4...

I'm happy to apply any patches :slight_smile: Even better if you wanted to extend
the tutorial to support things like garbage collection.

I would consider doing this, but one of the main reasons I couldn't use the
main tutorial was because the Fedora 10 camlp4 package was messed up and the
pre processors weren't linked correctly (I don't think Fedora likes OCaml
much).

What was the problem? I just tested it out on fedora 10 and it worked
after I installed the ocaml, ocaml-camlp4, and ocaml-camlp4-devel
rpms.

Anyway, I think it might be best to not include camlp4 macros in the
parser, simply to reduce the number of tools that someone needs to know to
understand the files. If they know camlp4 they can always add support for
it in their own projects or as an exercise in understanding the tutorial
code.

Well you still run into that with ocamllex/ocamlyacc :slight_smile: Maybe I'm just
used to the stream syntax extension, but I figured it was fair game
since it's part of the standard install. It's not my fault the fedora
project decided to make it not part of the standard ocaml install :slight_smile:

I’m happy to apply any patches :slight_smile: Even better if you wanted to extend
the tutorial to support things like garbage collection.

As I go along in my project and explore these features I might be willing to produce code for additional chapters but I don’t think I have time to dedicate to it now.

What was the problem? I just tested it out on fedora 10 and it worked
after I installed the ocaml, ocaml-camlp4, and ocaml-camlp4-devel
rpms.

When I tried it the pre-processor executable had not been created and installed. When the compiler tried to pass it through, it couldn’t find anything. It has been a little bit since I tried, and there may have been updates, so the issue may have been fixed.

Well you still run into that with ocamllex/ocamlyacc :slight_smile: Maybe I’m just
used to the stream syntax extension, but I figured it was fair game
since it’s part of the standard install. It’s not my fault the fedora
project decided to make it not part of the standard ocaml install :slight_smile:

That is perfectly understandable. My problem was that I had never seen the stream syntax before. While I’m no OCaml master, I have used it for a while and so it might be that other people coming to the tutorial may have the same experience. Secondly, it seems odd to be writing custom parsers and lexers after tools like Bison and Flex (and ocamllex and ocamlyacc) have been around for some time. Is there any particular reason the C++ tutorial is using hand coded parsers and lexers? Is there a technical reason?

  • Chris Wailes

That is perfectly understandable. My problem was that I had never seen the
stream syntax before. While I'm no OCaml master, I have used it for a
while and so it might be that other people coming to the tutorial may have
the same experience.

This page may be of help:

  http://www.ffconsultancy.com/ocaml/benefits/parsing.html

It describes recursive descent parsing using the camlp4 stream parser
extension and lex/yacc.

You may also be interested in this LLVM-based example in OCaml that uses
camlp4 for parsing directly:

  http://groups.google.com/group/fa.caml/msg/5aee553df34548e2

Secondly, it seems odd to be writing custom parsers
and lexers after tools like Bison and Flex (and ocamllex and ocamlyacc)
have been around for some time. Is there any particular reason the C++
tutorial is using hand coded parsers and lexers? Is there a technical
reason?

I found flex and bison really tedious to use from C++. That was a long time
ago and there are probably better alternatives now though. Still much worse
than anything with first-class lexical closures and algebraic datatypes, of
course. :slight_smile:

Same here. I also have issue with them due to some restrictions in
the syntax type they use. Personally I use Boost.Spirit2x (and the
first project I made with llvm was a little functional language that
was taught at my school a decade ago in the intro compiler course) as
it is pure C++ (no nasty pre-compiling of other stuff needed) and it
is a PEG parser (meaning that, unlike EBNF and so forth, it is
completely unambiguous in all cases), and the latest versions are just
about faster then I could hand-code it.

I still have that project laying around, I could turn it into a
tutorial as well (although I think the language definition I used is
copyright, I might want to change it to use the same syntax as the
existing tutorials, they are *very* close in syntax anyway, may not
even need to change the tree Spirit2x generates). Instead of doing
things by using classes and using virtual dispatch, I used standard
visitors that nicely get mostly compiled out in release, so it is
quite fast, and I say it is also a great deal easier to read and
understand then the existing tutorials. However, if I did make a
tutorial using Boost.Spirit2x and the visitor pattern, it would have
the requirement of needing Boost (which I personally think any C++
programmer should have anyway, some of it is rather useless but there
are some absolute gems in Boost).

Would you all accept a tutorial of the language the current tutorial
uses, but using Boost.Spirit2x as the parser (*vastly* reduced and
easier to read parsing code, would be but a tiny section), using
visitors (much easier to read through in my opinion, as well as being
faster and more in-line with what a compiler should be doing for
proper practices), and other such niceties, in exchange for needing
Boost for the tutorial (would of course be a section on how to set
Boost up as the tutorial would only be using header-only Boost
libraries anyway, would be very simple)?

Oh, and yes, Boost.Spirit2x basically adds such closures and such due
to heavy use of Boost.Pheonix.