Garbage Collection Project

A while back there was a discussion thread about whether an accurate, concurrent garbage collector could be "generic" in the sense of being able to support multiple different languages efficiently. After having done some work on this, I now believe that this is the case - using C++ policy-based design principles, you can create a set of modules that represent different aspects of collector behavior (such as mark-and-sweep, stop-the-world, and so on) along with different aspects of the runtime environment (object tracing strategies, heap structures, threading primitives, atomics), and encode these various behaviors as template classes which can be bound together to create an efficient collector.

The work I have done so far can be found here:

   http://code.google.com/p/scarcity/

At the moment, it's still in the "toy" stage, meaning you can run the unit tests and you can write toy languages that use concurrent garbage collection. It is by no means ready for real-world use. It does not yet have support for incremental or generational collection, nor does it have weak pointer or pinning support yet. It is not yet integrated with LLVM's garbage collection intrinsics, although that too is planned. There's also at least one data race bug lurking in the code somewhere, which I have not yet been able to tease out with valgrind.

The project is now at the stage where a small number of additional eyeballs would be useful, which is why I'm posting this here.

-- Talin

Hi Talin,

This is great news! I have some questions...

I had great success using some seemingly-unusual techniques in my experiments.

Firstly, rather than using a single 1 word pointer to represent a reference I
chose to use 3 words including a pointer to the type and a pointer to the
value (as well as metadata). This allows typed nulls and that addresses an
important deficiency found in most other VMs including the CLR. Is Scarcity
able to handle such references or does its implementation of stack frames
require references to be a single word?

Secondly, I used LLVM to JIT compile per-type code for garbage collection in
order to traverse data structures as efficiently as possible. Moreover, I
chose to write the entire GC in an unsafe subset of the language that I was
implementing so that I could rely upon tail calls and so forth. Does Scarcity
require the GC code to be written entirely in C?

Finally, do you have any references describing the techniques you have used to
implement a concurrent GC? I have been reading up on the subject for several
years now, with a view to implementing my own concurrent GC.

Firstly, rather than using a single 1 word pointer to represent a reference I
chose to use 3 words including a pointer to the type and a pointer to the
value (as well as metadata). This allows typed nulls and that addresses an
important deficiency found in most other VMs including the CLR. Is Scarcity
able to handle such references or does its implementation of stack frames
require references to be a single word?

Three words sounds pretty expensive to me, I can see the use of an extra word for typed nulls. If you look at something like the CLR you will see that you have very fast access to an objects type, after all when you have a reference to an object what you really have is a reference to an objects’ object header. From there you can access an objects syncblock (if it has one, which is before the obj header) and the objects’ method table which includes the relevant pointers to the objects’ type among other things. It simply means you do a few hops, each of which is a constant operation anyway.

Maybe I’m missing something key about the language you are implementing the GC for, also is it really necessary to use an extra word for null types?

Granville

2009/6/18 Jon Harrop <jon@ffconsultancy.com>

I'm also curious what language uses this and why it is useful :slight_smile:
Also, things like this would make lock-free algorithms difficult or impossible.

> Firstly, rather than using a single 1 word pointer to represent a
> reference I
> chose to use 3 words including a pointer to the type and a pointer to the
> value (as well as metadata). This allows typed nulls and that addresses
> an important deficiency found in most other VMs including the CLR. Is
> Scarcity able to handle such references or does its implementation of
> stack frames require references to be a single word?

Three words sounds pretty expensive to me, I can see the use of an extra
word for typed nulls.

The word is not really "extra" because I just moved the header out of the
value and into the reference. For example, the three words for an array are a
pointer to the type, the array length and a pointer to the (C-compatible)
array data. In addition to providing typed nulls, this has the added
advantage of simplifying interop.

If you look at something like the CLR you will see
that you have very fast access to an objects type, after all when you have
a reference to an object what you really have is a reference to an objects'
object header. From there you can access an objects syncblock (if it has
one, which is before the obj header) and the objects' method table which
includes the relevant pointers to the objects' type among other things. It
simply means you do a few hops, each of which is a constant operation
anyway.

That is similar to the approach I used, although HLVM provides a pointer
directly to the type, saving you a single hop.

Maybe I'm missing something key about the language you are implementing the
GC for, also is it really necessary to use an extra word for null types?

I would not have considered it had I not seen the damage done to F# by the
CLR's typeless nulls. You want an unallocated constant for many different
purposes in F#, such as representing the empty option type None, the empty
list [] and maybe the unit value (). Unfortunately, using "null" means you
don't get any type information and that means that none of those values work
correctly with anything requiring run-time type information such as generic
printing, serialization and so on. The only alternative is to use an
allocated value but the allocator and (concurrent) GC are very slow so that
gives you the functionality you want but only at a grave cost in terms of
performance.

Consequently, they chose to represent the empty list with an allocated value
(so [] prints correctly) but the option type uses null. Hence printf "%A"
None prints "<null>". They've also used other tricks like having an internal
set representation that uses nulls but is wrapped in another representation
that handles them correctly but only at the cost of an extra level of
indirection.

I'm also curious what language uses this and why it is useful :slight_smile:

HLVM is intended to be a general-purpose VM rather than a particular language.

Also, things like this would make lock-free algorithms difficult or
impossible.

True. Perhaps that is a good argument for providing both kinds. However, nulls
are certainly more common than concurrent data structures. :slight_smile:

The entire source code for HLVM is available for free under a
commerce-friendly BSD license, BTW:

  http://forge.ocamlcore.org/projects/hlvm/

That is similar to the approach I used, although HLVM provides a pointer

directly to the type, saving you a single hop.

I’m not so sure that is a very good reason, depending on your implementation data structures that are fundamental to the type system of the virtual machine use custom allocators so the extra hop carries little to no expense.

I would not have considered it had I not seen the damage done to F# by the

CLR’s typeless nulls. You want an unallocated constant for many different

purposes in F#, such as representing the empty option type None, the empty

list [] and maybe the unit value (). Unfortunately, using “null” means you

don’t get any type information and that means that none of those values work

correctly with anything requiring run-time type information such as generic

printing, serialization and so on. The only alternative is to use an

allocated value but the allocator and (concurrent) GC are very slow so that

gives you the functionality you want but only at a grave cost in terms of

performance.

Consequently, they chose to represent the empty list with an allocated value

(so [] prints correctly) but the option type uses null. Hence printf “%A”

None prints “”. They’ve also used other tricks like having an internal

set representation that uses nulls but is wrapped in another representation

that handles them correctly but only at the cost of an extra level of

indirection.

I know little to nothing about that so I find it hard to comment.

The only alternative is to use an allocated value but the allocator and (concurrent) GC are very slow so that

gives you the functionality you want but only at a grave cost in terms of

performance

I find this hard to believe, the GC used in CLR 2.0 is very fast. Surely its only downfall is that it uses a general mark-sweep generational algorithm so some languages (yours may be one of them) may suffer because of its generality. It sounds though as if you are using the fast allocator in the runtime anyway (which most objects do) which is incredibly fast as it does little to no checks before allocating space for the object.

It may be that your solution cuts some of the time off the GC, but I really don’t think that the GC in the second incarnation of the CLR is slow.

2009/6/18 Jon Harrop <jon@ffconsultancy.com>

My first point, i.e. the custom allocator was in relation to the fact that key data structures are allocated to private memory reserved for the runtime so things tend to be laid out pretty well so the extra hop is not that expensive. Just thought I’d make that clearer.

2009/6/18 Granville Barnett <granvillebarnett@googlemail.com>

Jon Harrop wrote:

  

A while back there was a discussion thread about whether an accurate,
concurrent garbage collector could be "generic" in the sense of being
able to support multiple different languages efficiently. After having
done some work on this, I now believe that this is the case - using C++
policy-based design principles, you can create a set of modules that
represent different aspects of collector behavior (such as
mark-and-sweep, stop-the-world, and so on) along with different aspects
of the runtime environment (object tracing strategies, heap structures,
threading primitives, atomics), and encode these various behaviors as
template classes which can be bound together to create an efficient
collector.
    
Hi Talin,

This is great news! I have some questions...
  

And thank you for your interest! :slight_smile:

I had great success using some seemingly-unusual techniques in my experiments.

Firstly, rather than using a single 1 word pointer to represent a reference I chose to use 3 words including a pointer to the type and a pointer to the value (as well as metadata). This allows typed nulls and that addresses an important deficiency found in most other VMs including the CLR. Is Scarcity able to handle such references or does its implementation of stack frames require references to be a single word?
  

As you no doubt know, there are a couple of different approaches to obtaining the type information for an object reference. These can be divided into static typing, where the compiler knows in advance the type of the reference, and object tagging, in which the type information is stored with the object being referred to (usually in the object itself, although your case its with the reference.)

Scarcity attempts to unify these two models by declaring that object tagging is a special case of static typing. For every object reference the compiler is required to supply a tracing strategy for that reference, in the form of a function that knows how to trace objects of that static type. That tracing strategy may know the entirety of the object's structure at compile time, in which case object tags are not needed; Or if the reference is polymorphic, then the tracing strategy function can inspect the object's tag in order to look up the metadata that describes where the references within the object are located. This is up to the language implementer to decide. In fact, you could conceivably have more than one object hierarchy each with it's own tag format, and so long as the compiler can determine statically which tag format is being used for a particular reference, it should all work.

The strategy function enumerates all of the references in the object and passes them to the trace visitor, which is a functor object supplied by the collection algorithm. The strategy function passes into the visitor a pointer to each reference, rather than the reference itself - the reason for this is to support copying collectors which can modify the reference to point to the new location.

This means that for your purposes, Scarcity does not dictate the format of object metadata nor the way objects are traced. However, some of the collector algorithms will assume that the pointer-to-reference is pointing to an object pointer. So you wouldn't be able to use those collector algorithms.

In other words, you probably won't be able to use the Scarcity collection algorithms for what you want, but what you can do is build a collector within the Scarcity framework that does what you want. Whether the various modules within Scarcity are valuable enough to be used in this way is up to you.

Secondly, I used LLVM to JIT compile per-type code for garbage collection in order to traverse data structures as efficiently as possible. Moreover, I chose to write the entire GC in an unsafe subset of the language that I was implementing so that I could rely upon tail calls and so forth. Does Scarcity require the GC code to be written entirely in C?
  

I imagine that most languages will want to write as much as possible in the hosted language rather than in C. The question is where to draw the dividing line. There are several issues which must be considered:

1) Small snippets of code such as write barriers should be inlined wherever possible. This pretty much requires that they be written in the same language as the code that they will be embedded within.

2) Many high-level languages really aren't very good at dealing with raw memory, because it requires you to introduce pointer arithmetic and other "unsafe" operations as you point out. For some languages, adding the necessary unsafe operations can distort and corrupt the language design to the point where it feels like you are writing a second compiler. If such is the case, then it makes sense to offload some of these operations to a lower-level language such as C.

3) In a hybrid approach where you are doing high-level stuff in your language, and then grunging around with pointers in C, there's some efficiency loss due to the fact that the compiler for each language cannot inline or optimize code written in the other language. (Although if you compile the C part with clang and use link-time optimization, you can mitigate that somewhat.) One has to think very carefully how to structure the API to avoid frequent jumping back and forth across the language barrier.

Because Scarcity is designed as a framework of modules, you can decide where to draw that line by deciding which modules that you are going to use. For example, you might choose not to use any of Scarcity's collection algorithms at all, but you still might use it's stop-the-world thread manager, or its memory heap.

Finally, do you have any references describing the techniques you have used to implement a concurrent GC? I have been reading up on the subject for several years now, with a view to implementing my own concurrent GC.
  

There's a wiki page on the scarcity site that lists references to a whole bunch of papers on garbage collection techniques.

   http://code.google.com/p/scarcity/wiki/RelevantPapers

-- Talin