Simplifying a front-end project

For my introductory Compiler Construction class, I have been giving the students a project to write a simple compiler for a toy, single-inheritance object-oriented language. We give them a set of classes implementing an AST for the language and a type checker as well. The students write (1) a scanner and parser to build the AST; (2) a translator from AST to LLVM; and (3) a couple of basic optimization passes on LLVM IR.

For the translator in step (2), I've so far had them generate LLVM IR in memory using the LLVM APIs. They find it a *lot* of work to learn the LLVM APIs, which doesn't teach them much about compiler concepts per se. To simplify this project, I am considering changing the project so they "print out" LLVM assembly directly instead of building up the IR in memory and writing it out.

Question: Is there a convenient way to print out the LLVM assembly correctly without having build the IR first? Getting the syntax right for all the types, declarations, functions, instructions, etc., is non-trivial also and the less error-prone we can make it for them, the better.

Otherwise, is there any other way they can write their translator that is easier than learning the LLVM APIs for building up the IR?

Any suggestions appreciated.

--Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign
http://llvm.org/~vadve

Vikram S. Adve wrote:

For the translator in step (2), I've so far had them generate LLVM IR in memory using the LLVM APIs. They find it a *lot* of work to learn the LLVM APIs, which doesn't teach them much about compiler concepts per se. To simplify this project, I am considering changing the project so they "print out" LLVM assembly directly instead of building up the IR in memory and writing it out.
  
I've found the LLVM API to be pretty easy to work with, at least by life's generally poor standards. I don't know UIUC's program at all, but I would guess there are 2-3 main problems your students are having:
1. The API is huge, and the autogenerated documentation is badly organized, especially for beginners.
2. The API assumes that users are familiar with various C++ idioms, particularly iterators.
3. The API assumes that users are familiar with C++.

The first point can be addressed with some "beginner's documentation" which highlights the important classes and their core operations. The second point can be helped with a lot of directed examples; I know there's some of this on the wiki already. The third point is somewhat harder to solve.

Question: Is there a convenient way to print out the LLVM assembly correctly without having build the IR first? Getting the syntax right for all the types, declarations, functions, instructions, etc., is non- trivial also and the less error-prone we can make it for them, the better.
  
Well, you could certainly do step 2 (the translator) by having the students write out LLVM assembly in text format (.ll), then assembling that (llvm-as) and, if required, loading the assembled bitcode. But that won't really help you in step 3 unless you want them to write a text-to-text pass, which would be hugely painful for all but very specific and trivial transformations. At some point, if they're going to use LLVM, they have to learn the API.

...well, unless you teach it in some other language with actively-maintained LLVM bindings (e.g. ocaml).

John.

Are you using TypeBuilder
(http://llvm.org/viewvc/llvm-project/llvm/branches/release_26/include/llvm/Support/TypeBuilder.h?view=markup,
new in 2.6) and IRBuilder, or trying to use the individual type and
value classes directly? I recommend the builders when they support
what you need. I don't think they'll ever completely replace the
individual classes, but in Unladen Swallow we haven't had to fall back
to the full API very often.

John McCall schrieb:

Vikram S. Adve wrote:

For the translator in step (2), I've so far had them generate LLVM IR in memory using the LLVM APIs. They find it a *lot* of work to learn the LLVM APIs, which doesn't teach them much about compiler concepts per se. To simplify this project, I am considering changing the project so they "print out" LLVM assembly directly instead of building up the IR in memory and writing it out.
  
I've found the LLVM API to be pretty easy to work with, at least by life's generally poor standards. I don't know UIUC's program at all, but I would guess there are 2-3 main problems your students are having:
1. The API is huge, and the autogenerated documentation is badly organized, especially for beginners.
2. The API assumes that users are familiar with various C++ idioms, particularly iterators.
3. The API assumes that users are familiar with C++.

Hiho,

I'm a student and I'm a beginner in LLVM; I think the API is quite straight-forward and easy to learn (most of the time). However, the biggest hurdle for me is the autogenerated documentation indeed. It is not badly organized; it's just missing out a lot. Often I find myself poking around guessing how things might work. Often my guesses are incorrect :slight_smile:

BTW: Currently I'm trying to understand SCEV. I think I will start reading all the loop passes in the next days. If someone has a better idea please drop me a line.

Ciao
Marc

I will reiterate this. LLVM APIs have made a lot of progress over the last couple years. They shouldn't be writing "new FooInst()" at all in their code, the IRBuilder and new TypeBuilder stuff are dramatically easier to use and generally just "do the right thing".

Generating text will work but will have tons of other logistical problems that are not really any easier to solve, and it seems bad to say "I'm teaching you something that is definitely not best practice but will make it easier to get going".

-Chris

I found these slides very helpful in understanding the theory behind
scalar evolutions:
http://cri.ensmp.fr/~pop/gcc/feb04/slides.pdf

Best regards,
--Edwin