Question about LLVM


Couple of months ago you informed me about LLVM. At that time, I was focusing my research on GCC RTL.

I just started looking into LLVM, and I have some questions:
- When the documentation mention IR, does it refer to LLVM assembly language or to LLVM bytecode?

Both. They are semantically equivalent. In particular, the in-memory compiler IR is exactly equivalent to the text and binary forms, they are jut expressed in different ways.

- In what form does LLVM (gccas) perform opimizations? Is it in LLVM assembly or LLVM bytecode?

gccas reads the text form into the in-memory form, does optimizations on the in-memory IR, then writes out the result in compressed bytecode format.

- Is there any way to dump the IR before and after each optimization?

Yes, gccas is just a series of passes. You can see what passes it runs by running:

gccas /dev/null -o /dev/null -debug-pass=Arguments

then, emulate gccas like this:

llvm-as < test.ll | opt <the list of passes> -o output.bc

In the list of passes, you can now insert -print wherever you'd like.

I haven't installed LLVM itself on my computer due to administrative reason. My research itself is about validation of optimized code. I've been using GCC RTL in my research. However, RTL lacks of source type annotations. This hinders the technique that I'm currently developing for code validation. I tried the LLVM demo site, and the result showed that the source type are preserved.

*Source* level types are not preserved, but a lot of type information that GCC does not preserve is kept. In particular, llvm has it's own type system.

I hope this helps,


Chris Lattner wrote:

I don't know what your project or goals are, but you might want to check
out LLVM. The representation we use looks almost exactly like this: