The most efficient way to compile to LLVM IR?

Hi all,
    I am trying to compile my toy language to LLVM back end. (I am new to LLVM, so my questions may sound naive)
    I am looking at some tutorials about LLVM, most are about how to use LLVM IRBuilder, however, I find the API provided by IRBuilder is quite imperative and verbose, and the API changes so fast that most of the tutorials are out of dated.
    So I am wondering what's the benefit of emitting LLVM IR using IRBuilder compared with designing my own abstract syntax in a high-level programming language(e.g. Haskell or OCaml) and unparsing it to LLVM IR. Is there some benefit of using IRBuilder I ignored here? And I have some follow-up questions: 1. How stable is the IR format? 2. Is the binary representation of IR format (*.bc) stable and the same across different platforms? 3. Is there any previous work of building a declarative`interface instead of using IRBuilder?
    Thank you in advance!

Hi all,
   I am trying to compile my toy language to LLVM back end. (I am new to
LLVM, so my questions may sound naive)
   I am looking at some tutorials about LLVM, most are about how to use LLVM
IRBuilder, however, I find the API provided by IRBuilder is quite imperative
and verbose, and the API changes so fast that most of the tutorials are out
of dated.
   So I am wondering what's the benefit of emitting LLVM IR using IRBuilder
compared with designing my own abstract syntax in a high-level programming
language(e.g. Haskell or OCaml) and unparsing it to LLVM IR.

To be clear you're suggesting having your frontend (say, for
argument's sake, written in C++) parse your toy language and then emit
a (say) Haskell representation of IR? Using some Haskell APIs you'll
write that will emit LLVM bitcode? And then running the resulting
Haskell program to produce your bitcode that you'll load back in to
LLVM to optimize/compile?

Is there some
benefit of using IRBuilder I ignored here?

It'll be more efficient to keep the IR in memory rather than to go out
to a source file, run that file to produce bitcode, then load that
bitcode in to LLVM.

Also I'm not sure I see quite how that scheme would be less verbose.

And I have some follow-up
questions: 1. How stable is the IR format?

There's in-built autoupgrade so that you can load old IR in newer
versions of LLVM (any 3.* series should be compatible, I believe).

2. Is the binary representation
of IR format (*.bc) stable and the same across different platforms?

It's the same format, but the actual bitcode isn't retargetable, as
such. (ie: don't expect to be able to produce bitcode that you can
compile for different architectures)

3. Is
there any previous work of building a declarative`interface instead of using
IRBuilder?

Not that I know of. The frontends tend to use IRBuilder.

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Hongbo Zhang
Subject: [LLVMdev] The most efficient way to compile to LLVM IR?

I find the API provided by IRBuilder is quite imperative and verbose,
and the API changes so fast that most of the tutorials are out of dated.

We've been using IRBuilder since 2.7, are now on 3.2, and upgrading with each release has been rather easy - not trivial, but not at all difficult. Using IRBuilder seems to be the simplest and most straightforward mechanism, and gives you the a great deal of control over what's fed to the optimizers.

- Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

Thanks for your reply.

> Hi all,
> I am trying to compile my toy language to LLVM back end. (I am new to
> LLVM, so my questions may sound naive)
> I am looking at some tutorials about LLVM, most are about how to use
LLVM
> IRBuilder, however, I find the API provided by IRBuilder is quite
imperative
> and verbose, and the API changes so fast that most of the tutorials are
out
> of dated.
> So I am wondering what's the benefit of emitting LLVM IR using
IRBuilder
> compared with designing my own abstract syntax in a high-level
programming
> language(e.g. Haskell or OCaml) and unparsing it to LLVM IR.

To be clear you're suggesting having your frontend (say, for
argument's sake, written in C++) parse your toy language and then emit
a (say) Haskell representation of IR? Using some Haskell APIs you'll
write that will emit LLVM bitcode? And then running the resulting
Haskell program to produce your bitcode that you'll load back in to
LLVM to optimize/compile?

Yes, that's what I am doing, in OCaml though. Functional languages are

excellent
for program transformation and manipulation. Where is the specification for
the bitcode format?
Thanks

http://llvm.org/docs/BitCodeFormat.html

-- Sean Silva

There are actually two separate things you might want to use the native C++ API for. The first, as you noted, is generating LLVM IR via IRBuilder. You might also want to have a custom opt/llc-like tool which will make it somewhat easier to integrate LLVM plugins, such as for garbage collection or language-specific passes implementing sanity checks and/or optimizations. Shared library plugins are supported on *nix but not Windows AFAIK.

One downside of doing your own abstract syntax, beyond the up-front investment of effort, is that as you use more of LLVM’s capabilities (vectors, debug metadata, custom calling conventions, etc), you’ll have to do more work to represent and unparse each new feature correctly.

On the other hand, having an explicit representation independent of IRBuilder state is useful when debugging any invalid IR your front-end generates. The problem is that invalid IR will trigger asserts in LLVM, and printing out bits and pieces of your partially-built module from gdb isn’t much fun.

GHC’s LLVM backend [1] takes the unparse-abstract-syntax approach, but they also use a relatively small subset of LLVM. For example, they don’t make any use of LLVM’s GC functionality. The Disciple language [2] has a fork of GHC’s LLVM code, but I don’t know offhand what the differences are. I don’t know of any such projects for OCaml.

FWIW my approach has been to generate a representation (from Haskell) which is fairly close to LLVM IR but abstracts some things, like GC metadata. This representation is then parsed (via serialized Protocol Buffers) from C++, which performs the appropriate IRBuilder actions, and can integrate statically-linked plugins.

[1] https://github.com/ghc/ghc/tree/master/compiler/llvmGen
[2] http://hackage.haskell.org/package/ddc-core-llvm

After reading this, I don't get whether you plan to do the parsing and
codegen in OCaml, and emit an OCaml representation of the IR.
If so, did you have a look at
http://llvm.org/docs/tutorial/OCamlLangImpl3.html ?

Regards,