Accurate garbage collection

Hello!

I'm looking at "Overview of available features" here:
http://llvm.org/docs/GarbageCollection.html#collector-algos and can't
understand something.
First, does table header mean that there are already some GC's
implemented besides "shadow stack"? E.g. can I just use them?
Second, does row "JIT", "NO" mean that GC is not compatible with jitting at all?

Best regards,
Victor Milovanov.

As I understand it, LLVM simply gives you support for garbage collectors
that you have to implement yourself and link into the final binary,
similar to what C's malloc does (it's a library call). The issue with
GC's is that they need to be provided info about the stack, thats where
LLVM's support comes in.
As far as I know, the garbage collector is linked into the final binary
along with the language's runtime.

Hi Victor,

You can write your own GC or use other’s GC with LLVM. What LLVM provides is a framework to generate a representation of objects locations in a method’s frames. Right now, LLVM can emit the shadow stack (which dynamically updates the locations), or the ocaml format. If you have implemented a GC, you can parse the ocaml format to locate the objects.

I think the web page needs to be updated, because you can make the GC work with the JIT. You can just use LLVM’s internal representations of frames.

Hope that helps,
Nicolas

Are there any standalone accurate garbage collectors that I could use in
my project, rather than having to write me own (or use libgc, which is
what I'm doing now)? Garbage collectors are subtle and very tricky and I
really don't want to have to do one myself, as I know I'll just get it
wrong.

I don’t know of any precise and efficient GC that you can take as-is, you often need to specialize them by, eg, providing an object model, multi-threading interface.

In VMKit, I use MMTk (http://jikesrvm.org/MMTk), which is “unfortunately” written in Java. However I am able to generate a .so file out of it that can link with C code. So if you are prepared to play with vmkit’s Java ahead of time compiler, and to understand MMTk’s interface to object model and multi-threading, using MMTk might be a solution for you.

Cheers,
Nicolas

You can try Hans Bohem Garbage collector.
http://www.hpl.hp.com/personal/Hans_Boehm/gc/

You can try Hans Bohem Garbage collector.
http://www.hpl.hp.com/personal/Hans_Boehm/gc/

This GC is not precise.

Cheers,
Nicolas

He said that he's already using it.

Eugene

Does this mean that everything I should do to make simple GC working
is setting shadow stack as a GC for every function, calling GC's alloc
to allocate (and possible do garbage collection), and during garbage
collection I need to traverse shadow stack to get list of stack object
references?

If so, how can I access shadow stack pointer? Is there any function or
must I declare some external variable and then link llvm?

Best regards,
Victor Milovanov.

David Given wrote:

Are there any standalone accurate garbage collectors that I could use in my
project, rather than having to write me own (or use libgc, which is what I'm
doing now)? Garbage collectors are subtle and very tricky and I really don't want
to have to do one myself, as I know I'll just get it wrong.

On the contrary, writing your own GC is both easy and enlightening. You'll need:

1. Your own shadow stack: an array of pointers to blocks in the heap (those that are directly reachable).
2. The allocated blocks: another array of pointers to heap blocks (all those that have been allocated).
3. A visit stack: yet another array of pointers to heap blocks (those waiting to be marked).
4. The set of marked heap blocks.

Maintain your own shadow stack by:

1. Pushing values upon
  a) allocation,
  b) reading references from the heap or
  c) obtaining references as the result of a function call.
2. Resetting your shadow stack pointer at all return points of all functions to the value it had upon entry to the function.

At regular intervals, check if the heap size has exceeded its quota and, if so, run a GC cycle. To run a GC cycle:

1. Starting with an empty set of marked heap blocks, mark the global roots (global variables and everything on the shadow stack) and then keep marking heap blocks from the visit stack until it is empty.
2. Do a sliding compaction on the allocated list, calling free() on anything unmarked and copying the rest over the top of it in the allocated list.

To mark a heap block:

1. Add it to the set of marked heap blocks.
2. Push any heap blocks that it references onto the visit stack.

That's it. You can do this in ~100 lines of code.

Cheers,
Jon.

I have seen people recommend doing this kind of check whenever hitting a malloc call.
I think it nicely scales with the level of heap activity for most programs.

Yes, performing GC checks at allocation can be good too. There are some
trade-offs.

Allocations tend to be more common than functions so checking per allocation
is more of a burden and, furthermore, it means you may have more GC safe
points.

HLVM's design is interesting in this regard as well. Specifically, its IR
has no concept of loops or jumps other than tail calls between functions. So
performing a GC check at the beginning of every function guarantees that
checks will be performed regularly. This is necessary because HLVM's GC
allows mutators to run in parallel, replying upon cooperative suspension. So
every mutator thread regularly checks the global GC state to see if another
thread has asked it to suspend itself pending a GC cycle. The only exception
is blocking calls (such as acquiring a lock) which work by temporarily
suspending themselves (making it safe for other threads to do a GC cycle
without them) and resuming when the blocking call returns.

As an optimization, HLVM actually just increments a local counter in its GC
check until the counter exceeds another threshold whereupon it does the
global GC check and resets the counter. All thread-local data are made
accessible via a pointer that is passed around as part of HLVM's calling
convention which, again, is a separate layer on top of LLVM's own fastcc.

Cheers,
Jon.

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu] On
Behalf Of Joachim Durchholz
Sent: 17 December 2011 09:58
To: llvmdev@cs.uiuc.edu
Subject: Re: [LLVMdev] Accurate garbage collection

> At regular intervals, check if the heap size has exceeded its quota
> and, if so, run a GC cycle.

I have seen people recommend doing this kind of check whenever hitting a
malloc call.
I think it nicely scales with the level of heap activity for most

programs.