Directly emit LLVM IR?

Hello! I'm interested in using LLVM as a target for a compiler I've
written in common lisp (SBCL). While I looked at perhaps wrapping the
LLVM C++ interface, wrapping C++ in, well, anything not C++ is a pain.
Someone on IRC mentioned that they didn't think I'd miss out on any
functionality by directly emitting IR, but suggested I query the list.

Do I miss out on any optimizations or other neat trickery by using the
IR directly? Does anyone know of any other platforms which directly
target LLVM via emitting IR?

Thanks,
      ...Eric Jonas

Hello! I’m interested in using LLVM as a target for a compiler I’ve written in common lisp (SBCL). While I looked at perhaps wrapping the LLVM C++ interface, wrapping C++ in, well, anything not C++ is a pain. Someone on IRC mentioned that they didn’t think I’d miss out on any functionality by directly emitting IR, but suggested I query the list.

Do I miss out on any optimizations or other neat trickery by using the IR directly? Does anyone know of any other platforms which directly target LLVM via emitting IR?

Hi Eric,

There are 3 major ways to tackle generating LLVM IR from a front-end:

Embed the LLVM C++ code.
for: best tracks changes to the LLVM IR, .ll syntax, and .bc format
for: enables running LLVM optimization passes without a emit/parse cycle
for: adapts well to a JIT context
against: lots of ugly glue code to write

Emit LLVM assembly from your compiler’s native language.
for: very straightforward to get started
against: the .ll parser is slower than the bitcode reader when interfacing to the middle end
against: you’ll have to re-engineer the LLVM IR object model and asm writer in your language
against: it may be harder to track changes to the IR

Emit LLVM bitcode from your compiler’s native language.
for: can use the more-efficient bitcode reader when interfacing to the middle end
against: you’ll have to re-engineer the LLVM IR object model and bitcode writer in your language
against: it may be harder to track changes to the IR

If you go with the first option, the C bindings in include/llvm-c should help a lot, since most languages have C FFIs. The C interface was designed to require very little manual memory management, and so is fairly straightforward to talk to with most FFIs.

— Gordon

Hi,

Emit LLVM assembly from your compiler’s native language.
for: very straightforward to get started
against: the .ll parser is slower than the bitcode reader when interfacing to the middle end
against: you’ll have to re-engineer the LLVM IR object model and asm writer in your language
against: it may be harder to track changes to the IR

One more problem with this: in order to emit float constants you have to convert them to hexadecimal notation which I found nontrivial. My pet language is still lacking proper float support because of this. I’m not aware whether there are more pitfalls like this lurking around. (Time to finally move to the ocaml bindings… :))

HTH,
Jan

Jan Rehders wrote:-

Hi,

>? Emit LLVM assembly from your compiler's native language.
>for: very straightforward to get started
>against: the .ll parser is slower than the bitcode reader when
>interfacing to the middle end
>against: you'll have to re-engineer the LLVM IR object model and asm
>writer in your language
>against: it may be harder to track changes to the IR

One more problem with this: in order to emit float constants you have
to convert them to hexadecimal notation which I found nontrivial. My
pet language is still lacking proper float support because of this.
I'm not aware whether there are more pitfalls like this lurking
around. (Time to finally move to the ocaml bindings.. :))

APFloat can emit hex representation of any float to desired precision
and rounding method.

Neil.

        • Emit LLVM assembly from your compiler's native language.
        for: very straightforward to get started
        against: the .ll parser is slower than the bitcode reader when
        interfacing to the middle end
        against: you'll have to re-engineer the LLVM IR object model
        and asm writer in your language
        against: it may be harder to track changes to the IR

I think I'm going to try this second one, and produce a nice set of
lisp-like bindings for emitting the IR. Are there any optimizations or
other LLVM features that I'll miss out on by going this route?
      ...Eric

Nope, the assembly format is a first-class representation.

— Gordon