ANTLR+LLVM example for simple C

Howdy,

I just finished a book called Language Implementation Patterns but I ran out of room at 400 pages before I could squeeze in an LLVM example. I left a link in the book to the ANTLR wiki so I can slap something together:

http://www.antlr.org/wiki/display/ANTLR3/LLVM

The code is good but the description was slapped together I'm afraid (i.e., don't take it as an example of the book quality. ha!). The example is cool because the same generator can emit LLVM IR or C code depending on which templates we use. It's simple enough that people might find it useful to learn about LLVM's IR and about source-to-source translators (I generate the text-based IR).

I'd welcome any feedback and corrections to the description or code. thanks!

Terence
PS If you're curious about the book, here's the publishers link:

http://pragprog.com/titles/tpdsl/language-implementation-patterns

It's at the printer as we speak! In stores by new years.

Terence Parr wrote:

Howdy,

I just finished a book called Language Implementation Patterns but I ran out of room at 400 pages before I could squeeze in an LLVM example. I left a link in the book to the ANTLR wiki so I can slap something together:

http://www.antlr.org/wiki/display/ANTLR3/LLVM

The code is good but the description was slapped together I'm afraid (i.e., don't take it as an example of the book quality. ha!). The example is cool because the same generator can emit LLVM IR or C code depending on which templates we use. It's simple enough that people might find it useful to learn about LLVM's IR and about source-to-source translators (I generate the text-based IR).

I'd welcome any feedback and corrections to the description or code. thanks!

Hi Terence,

Strangely enough, that wiki page appears to require a login to edit, and the login page doesn't have any facility for creating new accounts. I gave it a cursory glance and have a few edits:

"global variables start with @, but registers a local variables start with %."
'a local' --> 'and local'. Note that local variables *are* registers in LLVM parlance. Notably, there is no address-of operation.

"all basic blocks must start with a label (functions automatically get one)."
Not quite. Consider this IR:
   define void @test(i32, i32) {
     add i32 %0, %1
     br label %4
     ret void
   }
which uses anonymous values everywhere. %0 is the first argument, %1 is the second, %2 is the first basic block, %3 is the result from the add and %4 is the name of the second basic block. Confused yet? Here's how it looks through llvm-as | llvm-dis:
   define void @test(i32, i32) {
     %3 = add i32 %0, %1 ; <i32> [#uses=0]
     br label %4

   ; <label>:4 ; preds = %2
     ret void
   }
Just remember that all anonymous values are numbered sequentially through the .ll. We commonly leave off (even llvm-dis does it!) the first basic block's label because you aren't allowed to branch to it and there's never a need to mention it in a phi node.

At the end of the fibo example:
"
     ret i32 %r11
     ret i32 0
"
Surely you want to remove the 'ret i32 0' line.

In the print 99 example:
"
     ; get 99 into a register; t0 = 99+0 (LLVM has no load int instruction)
     %t0 = add i32 99,0
     ; call printf with "%d\n" and t0 as arguments
     call i32 (i8 *, ...)* @printf(i8* %ps, i32 %t0)
"
Is there any reason not to just write:
   ; call printf with "%d\n" and 99 as arguments
   call i32 (i8*, ...)* @printf(i8* %ps, i32 99)
? Doing an 'add i32 99, 0' is awfully awkward. If you really want to use a register here, the preferred trick is '%t0 = bitcast i32 99 to i32' since it's more general across different types -- but this sort of thing is strongly discouraged.

The link to llvm.org/docs/LangRef.html is far too well hidden given its importance to anyone trying to write in IR. I'm not sure exactly what to do with it yet.

Finally, congrats on finishing your book!

Nick