Writing LLVM front-end

Hi,

I'm new in LLVM project , I have to write a compiler frond-end (based
on LLVM) in C++ to support my programming language .Am I have to
implement support for each architecture or LLVM has libraries that
can do this for me ? (btw I don't want to support architecture that
LLVM don't) .

does LLVM has helper functions to create objects in heap ? if not is
it good idea to use C++11 smart pointer to do memory management
operations (at least in the first versions of my language ) ?

Sorry,if some questions are silly , but I'm still beginner in compiler's world .

Thanks.

Hi,

I'm new in LLVM project , I have to write a compiler frond-end (based
on LLVM) in C++ to support my programming language .Am I have to
implement support for each architecture or LLVM has libraries that
can do this for me ? (btw I don't want to support architecture that
LLVM don't) .

Most of the architecture-specific code is in LLVM, away from anything
you'll have to worry about. Some stuff ends up leaking up into your
front-end, but not much if you're starting with a simple language and don't
need to interoperate with existing C libraries, for example. (if your
language needs to be able to call into existing C (or, worse, C++)
libraries then you'll need to worry about the ABI, and that's a fair amount
of work/code/knowledge that needs to reside in your frontend)

does LLVM has helper functions to create objects in heap ?

In your target program? No, there's not much/any help there - you just call
malloc/free (or op new/delete) in your LLVM IR like you would in a simple C
program.

if not is it good idea to use C++11 smart pointer to do memory management
operations (at least in the first versions of my language ) ?

Inside your target program? You won't, presumably, have access to the C++
standard library in your language so you won't be able to use its
convenient smart pointers. You can make your own in your new programming
language, depending on what language features you have to work with.

The LLVM Kaleidoscope tutorials might give you some idea of how to get
started writing a compiler using LLVM.

- David

Thanks you for help , David.

Can you briefly explain this:
" Most of the architecture-specific code is in LLVM, away from
anything you'll have to worry about.

Most of this is in LLVM's backends - that's where things like CPU
instruction set, register files, etc, etc, are - so your frontend doesn't
need to know about much of that.

Some stuff ends up leaking up
into your front-end "

The stuff that you end up having to worry about in the frontend mostly
relates to the ABI things I mentioned earlier - knowing that on certain
platforms you need to put the function parameters in a certain
order/structure so you can interoperate with existing C code, etc.

If you don't need that interoperability, you don't have to worry about that.

Ahmed,

You do not have to write code for each target architecture. Rather, when you invoke LLVM to generate your code, you specify the requested target. The best way to proceed here is to look at the sources for llc and opt. There is no well-defined API for ahead-of-time code generation, so you’ll need to use llc/opt as examples. Be aware that the API will change from release to release. In particular, the change from 3.5.1 to 3.6.0 is significant.

For my project I call out to my own runtime to perform tasks such as memory allocation. You can’t really use C++ smart pointers in your generated code. What you are generating most closely resembles assembly language - you have nothing high-level. Here, I would recommend that you define the interface between your generated code and runtime library in terms of plain C functions using the standard C calling convention. You would then implement your own garbage collector in your runtime. If you don’t want to do that, then consider using the Boehm-Demers-Weiser conservative collector. It requires no special support from LLVM.

I’m also a beginner, so asking on this beginner thread – is there anything in addition to the C ABI that one needs to worry about? I imagine a C ABI gives lots of free libraries to integrate into your language, etc.

Also the C ABI change on platforms? I thought C Calling convention was the same on all platforms.

I'm also a beginner, so asking on this beginner thread -- is there
anything in addition to the C ABI that one needs to worry about? I imagine
a C ABI gives lots of free libraries to integrate into your language, etc.

Being able to match the local C ABI is generally good enough for most
languages. It makes it easy to implement the language runtime in C, and let
you build an FFI to external libraries.

However, if you want to leverage libraries in another language like C++ or
Obj-C, it may be worth doing more work to allow deeper interoperability, as
is done in Swift for Obj-C. See the "Skip the FFI" dev meeting talk:
http://llvm.org/devmtg/2014-10/#talk18

Also the C ABI change on platforms? I thought C Calling convention was the
same on all platforms.

It is definitely different. However, if you're just passing scalar values
(ints, floats, and pointers), then LLVM will more or less abstract it all
away for you. If you pass aggregates, the frontend needs to know the
platform calling convention and how LLVM implements it.

Thanks, Reid.

When you say the frontend needs to know about how LLVM implements it, do you mean, the Function* type in LLVM, and in which order you add to the arguments array when building this type?

Thank you for the video link, just reading the abstract I see this is the way to go. However, as I finished typing the sentence, I wonder if this means clang and/or llvm may be a runtime dependency. I imagine it doesn’t need to be and that clang is the tool I’ll use to link to the external libraries without an FFI.

Thank you all,

David Jones

You would then implement your own garbage collector in your runtime. If you don't >>> want to do that, then consider using the Boehm-Demers-Weiser conservative
collector. It requires no special support from LLVM.

I was thinking to implement smart pointer instead of using GC , I mean
like this:

obj b = new obj();

b will automatically be instantiate as smart pointer (which will be
deleted automatic also).
I don't know if it's bad idea or not , but that what I thought .

Another solution is to implement GC using LLVM GC (described here
http://llvm.org/docs/GarbageCollection.html) , but I don't know its
performance .

Can you advice me to the best solution , between the first and the
second solution ? does LLVM GC better than python's GC and other
equivalents ?

Thanks.

The path of least effort, as I previously stated, is to use Boehm-GC.

Reference-counted smart pointers will require that you implement the reference counting in your language, in the generated code. e.g. a statement such as:

p1 = p2

really compiles into:

p1.refcount–; if (!p1.refcount) free(p1)

p1 = p2

p1.refcount++

If you want your language to be thread-safe, then the increment and decrement operations must be atomic.

Although this is doable, your implementation must be perfect. Any bug, and you will either leak memory, or prematurely free a pointer. I speak from experience. Also be aware that pure reference counting will not collect cyclic data structures.

The “LLVM GC” web page does not document a specific, usable, garbage collector. Rather, it documents the features that LLVM provides to interoperate with garbage collectors that require special considerations for loads and stores. For example, parallel and incremental garbage collection often requires that each pointer dereference first checks if the pointee needs to be scanned. The functionality of the “LLVM GC” support allows this to happen. However, you must still provide your own collector.

A lot of good suggestions above.

I think one COULD write a compiler (frontend) that has
pointers/references to objects which are "behind the scenes"
implemented similar to smart pointers. I have never tried to do this,
however. You would need to have the language generate
constructor/destructor/assign operations in some way, but shouldn't be
particularly hard. (David Jones just wrote a similar reply!)

I have implemented my own Pascal compiler (which doesn't have garbage
collection) - in itself, it uses the ostrich method of memory
management [stick your head in the sand and hope nothing bad happens
while you are not watching] - the runtime component does `malloc` and
`free` to implement the Pascal "new" and "dispose" functions. It
supports standard pascal, so "objects" don't exist (I have vague plans
to do that at some point!)
Here's my project: https://github.com/Leporacanthicus/lacsap
(It is something I work on in evenings and on weekends, and it's just
"for fun", but I try to make sure it does work reasonably well - there
are several things that aren't great at the moment, mostly error
handling [give it the right kind of "bad" code and it will fail to
compile with some assert in LLVM] - most covered in the README, but
some are only documented in my head...)

In general, I find LLVM pretty easy and straight forward to work with.

I mean "the best solution" , from performance and safety perspectives.
not the easiest one.

I think one COULD write a compiler (frontend) that has pointers/references to objects >>> which are "behind the scenes"
implemented similar to smart pointers. I have never tried to do this,however. You would
need to have the language generate constructor/destructor/assign operations in some way.

That is what I mean by using smart pointer instead of GC :slight_smile: .