languages, semantic trees, LLVM interfaces

Hello LLVM fathers,

1. "languages, semantic trees"

   what do you think ideally, do languages implementations based on
   LLVM need internal semantic tree or they should rather try to use
   LLVM directly in/after syntax parsing?

   For languages like C++ the expected answer is "of course we need
   an internal semantic tree between parsing and LLVM!"
   But I am still wondering what are your strategic plans conserning
   this issue.

2. For other (not C/C++) languages there should be some interface
   solution in order to use LLVM from their parsers (e.g. ocamlyacc).
   So, what about that?

Thank you.

Valery,

For any language with relatively sophisticated syntax and semantic
rules, you will probably need a higher-level representation like an
Abstract Syntax Tree in order to do type-checking and other kinds of
checking. For OCAML, for example, the front-end is quite sophisticated
and complex and the LLVM representation would not be suitable for
supporting all the checking and translation. It would also be difficult
to do these tasks directly within the parser if you parsed directly into
LLVM. So a higher-level representation is necessary.

After the front-end tasks are done, another interesting issue is what
happens with languages compiled to a well-defined bytecode language
(e.g., JVM or MSIL). For such cases, our aim is to allow a fast,
simple, runtime translation from bytecode to LLVM, and then to do all
machine-dependent code generation and optimization, including runtime
optimization, on LLVM. In fact, these "back-end" mechanisms would be
language-independent, i.e., they would be common for different bytecode
languages. Of course, the JVM or the CLR virtual machines will still be
needed to implement the runtime systems, but they would run on top of
LLVM and use the common back-end mechanisms for code generation. (In
fact, nearly all of the JVM or CLR system would be machine-independent
this way.) This is a long-term goal; Alkis Evlogimenos is a new student
in my group and is just starting work on this general direction.

--Vikram
http://www.cs.uiuc.edu/~vadve

Hello Vikram,

Saturday, September 6, 2003, 9:10:45 PM, you wrote:

For any language with relatively sophisticated syntax and semantic
rules, you will probably need a higher-level representation like an
Abstract Syntax Tree in order to do type-checking and other kinds of
checking.

OK, concerning AST -- I see. Thank you.

For OCAML, for example, the front-end is quite sophisticated
and complex and the LLVM representation would not be suitable for
supporting all the checking and translation.

But I have just meant ocamlyacc, i.e. ocaml clone of the yacc.
And here my question was: what kind of interface is expected to use
from within ocml, CommonLisp implementations, Haskel and other nice
language implementations? What is the expected way of interfacing
LLVM for those non-C language implementations? shared libraries with
plain C-interface or what?

Kind regards,

> For OCAML, for example, the front-end is quite sophisticated
> and complex and the LLVM representation would not be suitable for
> supporting all the checking and translation.

But I have just meant ocamlyacc, i.e. ocaml clone of the yacc.
And here my question was: what kind of interface is expected to use
from within ocml, CommonLisp implementations, Haskel and other nice
language implementations? What is the expected way of interfacing
LLVM for those non-C language implementations? shared libraries with
plain C-interface or what?

Valery,

This is a good question and I don't think we have a good answer yet. We
are already facing this issue for JVM and OCAML front-ends which we hope
to develop in the next few months. If we have a good solution, we'll
definitely let this list know.

--Vikram

> > For OCAML, for example, the front-end is quite sophisticated
> > and complex and the LLVM representation would not be suitable for
> > supporting all the checking and translation.
>
> But I have just meant ocamlyacc, i.e. ocaml clone of the yacc.
> And here my question was: what kind of interface is expected to use
> from within ocml, CommonLisp implementations, Haskel and other nice
> language implementations? What is the expected way of interfacing
> LLVM for those non-C language implementations? shared libraries with
> plain C-interface or what?

There are currently two options:

1. If you can, linking your front-end to the LLVM libraries and using the
   C++ API is certainly the simplest and most stable way to do it.
2. Otherwise, you can build some form of LLVM representation in your
   front-end, then output LLVM "assembly" language. This is not as nice
   as option #1, because you have to reinvent a new representation for
   LLVM, which, although not difficult, seems like a waste of time. :slight_smile:

FWIW, the C/C++ front-end currently uses option #2. The reason for this
is that the C/C++ front-end is built into the GCC infrastructure, which is
not very friendly to C++. Also, this prevents the LLVM infrastructure
itself from having to be GPL'd.

-Chris