2.2 garbage collector questions

Hello,

i want to implement a common lisp subset using llvm for fun. This requires the use of a garbage collector. I read the docs, but many things are still unclear to me.
1. how are collectors supposed to find all living objects? there is llvm.gcroot for marking objects on the stack,but how do collectors crawl heap objects? I did not see a way to provide custom mark functions. Are collectors supposed to rely on the type informations provided by llvm?
2. what about gcreading and gcwriting objects of a different type than i8*?
3. No Intrinsic for allocating gc memory?
4. When passing a gc managed pointer to a c function,how should that function access the pointer? directly? there seem to be functions for that in the runtime/ source directory,but it also seems to be meant for the llvm c frontend? (and not for use in an extra c library)
5. some collectors update pointers upon read/write access, can the llvm gc api provide this?

sorry for so many questions (and sorry if some of them may be stupid).
thanks in advance for help.

  thomas

Hi Thomas,

i want to implement a common lisp subset using llvm for fun.

Welcome!

This requires the use of a garbage collector. I read the docs, but many things are still unclear to me.

One of the important things to recognize about LLVM’s GC support is that it only provides facilities which must be provided by the compiler back end. There are many other necessary facilities which must be provided by a complete language implementation. For the parts of the system where it doesn’t need to be involved, LLVM simply gets out of your way.

  1. how are collectors supposed to find all living objects? there is llvm.gcroot for marking objects on the stack,but how do collectors crawl heap objects? I did not see a way to provide custom mark functions. Are collectors supposed to rely on the type informations provided by llvm?

Registering roots (e.g. globals) is a facility your runtime should provide. You could use a module initializer to call into the runtime and accomplish this.

Traversing the object graph is also outside of LLVM’s scope. Your code would need to emit whatever metadata is necessary for it to correctly traverse the graph at runtime.

  1. what about gcreading and gcwriting objects of a different type than i8*?

Simply bitcast to/from i8*. If you don’t need write barriers, you would be advised to ignore gcread and gcwrite entirely, since currently they will only pessimize your code as compared to direct loads and stores.

  1. No Intrinsic for allocating gc memory?

Again, that’s a feature of the runtime. Simply write a function to do so. For efficiency, your compiler may want to inline part of allocation if you’re using a bumpptr allocator.

  1. When passing a gc managed pointer to a c function,how should that function access the pointer? directly?

Generally speaking, yes, directly. If you are using a moving collector and need to pin or copy an object, LLVM neither helps nor hinders your efforts.

there seem to be functions for that in the runtime/ source directory,but it also seems to be meant for the llvm c frontend? (and not for use in an extra c library)

I’m not sure what you mean here.

  1. some collectors update pointers upon read/write access, can the llvm gc api provide this?

gcread easily allows the pointer loaded from the derived pointer to be updated when it is read. Updating the object pointer from gcread or gcwrite may be possible, but is not straightforward within the framework.

Have fun,
Gordon

thomas weidner wrote:

Hello,

i want to implement a common lisp subset using llvm for fun.

Out of curiousity, for which CL implementation is this targeted? sbcl?
Or something you're rolling?

The reason why I ask is that I expressed an outrageous opinion at
Supercomputing back in November, to wit, that CL is probably the best
language suited for today's multicore problems... but I don't have the
time to hack one of the current implementations to proove the point.

Although, you'll notice that LLVM amply prooves "Greenbaum's hypothesis"
(IIRC): inside every sufficiently complex program there is an
implmentation of a Lisp interpreter. :wink:

-scooter

Scott Michel <scottm <at> rushg.aero.org> writes:

Out of curiousity, for which CL implementation is this targeted? sbcl?
Or something you're rolling?

I wanted to roll out my own lisp, and maybe use some library code from existing
lisps (think of loop or format). Adding an LLVM backend to an existing lisp
implementation is a nice idea, but currently not planned.

The reason why I ask is that I expressed an outrageous opinion at
Supercomputing back in November, to wit, that CL is probably the best
language suited for today's multicore problems... but I don't have the
time to hack one of the current implementations to proove the point.

interesting, what makes lisp superior in this area over languages with explicit
support for parallell computing (like erlang? or Ada) or languages which may be
easier auto parallelized (like haskell because of its functional nature).

Although, you'll notice that LLVM amply prooves "Greenbaum's hypothesis"
(IIRC): inside every sufficiently complex program there is an
implmentation of a Lisp interpreter.

Nice statement :slight_smile:

I am also interested in implementing functional programming languages using
LLVM. Although I'm only interested in statically typed ones, there is a lot
of work that can be shared.

LLVM currently lacks working examples demonstrating the use of garbage
collection and exceptions, so that would be a good first port of call. Then
we can worry about an intermediate representation that supports closures.

If you are familiar with functional programming and, in particular, its
benefits in the context of compiler work then you might like to use Gordon's
OCaml bindings to LLVM that are bundled with LLVM. They are very easy to use
and will make subsequent work vastly easier than trying to write everything
in C++.

Jon Harrop wrote:

If you are familiar with functional programming and, in particular, its
benefits in the context of compiler work then you might like to use Gordon's
OCaml bindings to LLVM that are bundled with LLVM. They are very easy to use
and will make subsequent work vastly easier than trying to write everything
in C++.

Or, indeed, the Haskell bindings I've been working on with Lennart
Augustsson. They're both safer and easier to use than the OCaml
bindings :slight_smile:

  <b

LLVM currently lacks working examples demonstrating the use of garbage
collection...

Hello,

if anybody has time, I would recommend putting a big disclaimer at the top of
the garbage collection page that explains that, for the most part, garbage
collection falls outside of LLVM's domain.

Right now LLVM advertises itself as "supporting garbage collecting", which
although true, somewhat misses the point. LLVM just happens to not get in the
way. The only real garbage collection feature that LLVM provides is accurate
stack walking, and even that can be implemented separately through a shadow stack.

That the documentation contains llvm.gc intrinsics is a bit of a red herring.
You end up doing most of the work yourself. To illustrate, for our own LLVM work
we implemented garbage collecting completely without using any LLVM primitives.
Granted, we're a conservative collector right now, and once we go accurate we'll
investigate the stack-walking.

Just a head's up from an beginner/intermediate LLVM programmer. The current
state of the GC documentation makes it seem as if: "you just connect the GC and
it works". I'm sure that isn't the intention, but judging from some of the posts
on this mailing lists over the past months, this seems to be a common expectation.

Two cents...

Jaap Suter

if anybody has time, I would recommend putting a big disclaimer at the top
of the garbage collection page that explains that, for the most part,
garbage collection falls outside of LLVM's domain.

Yes. However, I think it would also be extremely productive to have working
examples showcasing what LLVM does provide, e.g. with trivial GC and
exception mechanisms. If nothing else, this would at least prove that they
really do work. :slight_smile:

Just a head's up from an beginner/intermediate LLVM programmer. The current
state of the GC documentation makes it seem as if: "you just connect the GC
and it works". I'm sure that isn't the intention, but judging from some of
the posts on this mailing lists over the past months, this seems to be a
common expectation.

I think that is the goal though. New users are certainly expecting it to work
because that's the way its phrased. It isn't there yet but it continues to
improve at a tremendous rate (IMHO). I'm just sorry that I haven't been able
to devote more of my own time to this...

There is also the question of when the LLVM maintainers should start kicking
stuff out of the LLVM repository. As soon as we get working examples I'm sure
they will be deluged with mini Lisp/Haskell/CAML etc. implementations, most
of which will snowball beyond minimal working examples and no longer belong
in LLVM itself.

IMO, there are two classes of projects here. The easy class are little samples or demos or examples which are (at most) a couple hundred lines of code. These can live in the main llvm repo, f.e. under llvm/examples.

Projects that are bigger than this can be part of the llvm project umbrella if desired, but should live in a separate module. For example, llvm-gcc, clang, stacker, java, poolalloc etc all exist this way. The bar for creating a separate module is very low, so this isn't a big issue.

One real issue though is that these projects will get out of synch with mainline if noone is active to maintain it (e.g. java). In my mind, this is actually a good thing: putting these into mainline would cause increased maintenance burden in the case when noone is interested in the project. When someone later becomes reinterested in a project, it usually isn't too hard to synch the project up to mainline again.

Anyway, in short, bring on the deluge!

-Chris

Can you elaborate on this?

We had a brief discussion here about making safer OCaml bindings by exploiting
static type systems but nothing became of it (AFAIK).

Jaap Suter wrote:
>> LLVM currently lacks working examples demonstrating the use of garbage
>> collection...
>
> Hello,
>
> if anybody has time, I would recommend putting a big disclaimer at the top of
> the garbage collection page that explains that, for the most part, garbage
> collection falls outside of LLVM's domain.
>
> Right now LLVM advertises itself as "supporting garbage collecting", which
> although true, somewhat misses the point. LLVM just happens to not get in the
> way. The only real garbage collection feature that LLVM provides is accurate
> stack walking, and even that can be implemented separately through a shadow stack.
>
> That the documentation contains llvm.gc intrinsics is a bit of a red herring.
> You end up doing most of the work yourself. To illustrate, for our own LLVM work
> we implemented garbage collecting completely without using any LLVM primitives.
> Granted, we're a conservative collector right now, and once we go accurate we'll
> investigate the stack-walking.

In my opinion this is exactly what LLVM _should_ provide. If you try to
implement an exact GC one of the most annoying bits is getting
information about your stack roots. Using a shadow stack is possible but
there are costs attached to it (in PyPy we observed slowdowns of around
5%) as well as implementation complexity.

> Just a head's up from an beginner/intermediate LLVM programmer. The current
> state of the GC documentation makes it seem as if: "you just connect the GC and
> it works". I'm sure that isn't the intention, but judging from some of the posts
> on this mailing lists over the past months, this seems to be a common expectation.

Maybe a disclaimer would make sense indeed.

Cheers,

Carl Friedrich Bolz

To put this into perspective, the difference in allocation performance (which
is also due to different run-times between OCaml and F#) is up to 500%, so
the 5% difference you've observed is tiny and (IMHO) well within the "no
significant difference" region. Moreover, I'd be amazed if that 5% were not
tremendously architecture and platform specific.

How many CPU designs and compilers have you tried this on?

thomas weidner wrote:

The reason why I ask is that I expressed an outrageous opinion at
Supercomputing back in November, to wit, that CL is probably the best
language suited for today's multicore problems... but I don't have the
time to hack one of the current implementations to proove the point.

interesting, what makes lisp superior in this area over languages with explicit
support for parallell computing (like erlang? or Ada) or languages which may be
easier auto parallelized (like haskell because of its functional nature).

Lisp macros make it much easier to "rewrite" the program to both analyze
the code and take advantage of certain features. Few, if any, other
languages have this feature or plan to implement it.

For example, assume that the loop macro could be extended in such a way
that one could analyze dependencies within the loop -- if there are no
dependencies, you get automagic parallelized code. Similarly, the macro
could be extended in such a way as to eliminate the dependencies, again,
automagic parallelization.

-scooter

thomas weidner wrote:

interesting, what makes lisp superior in this area over languages with explicit
support for parallell computing (like erlang? or Ada) or languages which may be
easier auto parallelized (like haskell because of its functional nature).

I almost forgot: there was a huge body of work from the 80's on high
performance and parallel Lisps. Lisp was the preferred system
programming language for the Connection Machine (CM Lisp.)

-scooter

Scott Michel wrote:

thomas weidner wrote:

The reason why I ask is that I expressed an outrageous opinion at
Supercomputing back in November, to wit, that CL is probably the best
language suited for today's multicore problems... but I don't have the
time to hack one of the current implementations to proove the point.

interesting, what makes lisp superior in this area over languages with explicit
support for parallell computing (like erlang? or Ada) or languages which may be
easier auto parallelized (like haskell because of its functional nature).

Lisp macros make it much easier to "rewrite" the program to both analyze
the code and take advantage of certain features. Few, if any, other
languages have this feature or plan to implement it.

I realize there might be Lisp macro-like packages for other languages,
such as OCaml. Dylan, for all intents and purposes, is A Dead Parrot (as
much as it pains me to say so [*]), but was a good example of how to
take an Algol-like syntax and incorporate Lisp-like macros in the
d-expression paper.

For other languages that feature 'eval', such as Python, the AST is
relatively complex enough that macros may not ever be viable, i.e., that
one could easily perform 'macroexpand-1' before 'eval'.

But this is my own personal opinion. Pile on and disgree!

-scooter

I've seen eval and quote and friends added to C:

   s = `printf ("Hi\n");

   eval s;

:slight_smile: We could extend clang to do this.