Assorted notes on garbage collection with LLVM

Hi all,

I’ve been working recently on a precise garbage collector which runs alongside native code JITted by LLVM. Today marks the first time the GC has passed its entire test suite as well as extensive soak tests in non-trivial programs.

It’s been an interesting and educational process, to say the least, and I’ve run into quite a few things that might be useful to know for others tackling similar projects.

I’ve compiled my notes on the subject here for posterity:

https://code.google.com/p/epoch-language/wiki/GarbageCollectionScheme

I’d be happy to field any questions about the adventure as well, although that probably belongs off-list.

Thanks,

  • Mike

Thanks for sharing. If there is anything which you feel would be relevant to add to our our official documentation, please feel free to send patches (or new documents <http://llvm.org/docs/SphinxQuickstartTemplate.html> if you have a lot to say).

– Sean Silva

Very good to see developments in LLVM+GC!

I wrote a high-level virtual machine with accurate garbage collection (supporting multiple user threads) back in 2008. Fearing LLVM’s GC support at that time, I used a naïve shadow stack, pushing local references as they appeared (from function arguments, load instructions and allocations) onto a thread-local stack and batch popping at function returns by resetting the shadow stack pointer. I put a GC safe point at every function’s entry point. HLVM prohibits loops (uses recursion instead with tail call elimination) so no other GC safe points are needed. Thanks to this very simple design, the single threaded implementation was about 20 lines of code and the multithreading-capable one was about 100 lines of code.

My benchmarks showed that absolute performance was very good (better than Java, MLton, OCaml and Haskell) and, in fact, just 15% slower than C++ in this case:

http://flyingfrogblog.blogspot.co.uk/2010/01/hlvm-on-ray-tracer-language-comparison.html

and scalability was also very good:

http://flyingfrogblog.blogspot.co.uk/2010/01/naive-parallelism-with-hlvm.html

I believe your approach is potentially faster but substantially more complicated so I would regard it as an optimization over the very simple approach that I used. Therefore, it would be very interesting and valuable see the performance advantages of your approach. Have you done any benchmarks?

Cheers,

Jon.

Hi Mike,

I've been working recently on a precise garbage collector which runs
alongside native code JITted by LLVM. Today marks the first time the
GC has passed its entire test suite as well as extensive soak tests in
non-trivial programs.

I'm glad to hear that you solved the issues you described in your last
mail. How did you solve them? Was that because of the stack-coloring bug?

There are some things in the code linked to in the Wiki page I do not
understand yet.

Where do the negative stack offsets come from? The stack root walking code
in my project doesn't need to check for negative stack offsets.

What are bookmarks?

What's the purpose of the TriggerGarbageCollection function?

Have you considered using a hash map in GarbageWorker() that maps safe
points to lists of GCRoots? I think this would simplify the code and reduce
the overhead of stack root walking.

Are you interested in improving the code generator's support for garbage
collection? I posted some thoughts in a mail with the Subject "New ideas
about how to improve garbage collection support". That would also solve the
stack-coloring bug.

-Manuel

Thanks for sharing. If there is anything which you feel would be relevant to
add to our our official documentation, please feel free to send patches (or
new documents <http://llvm.org/docs/SphinxQuickstartTemplate.html> if you
have a lot to say).

Thanks - I'll set aside some time over the weekend to put together some docs.

I believe your approach is potentially faster but substantially more
complicated so I would regard it as an optimization over the very simple
approach that I used. Therefore, it would be very interesting and valuable
see the performance advantages of your approach. Have you done any
benchmarks?

I've done some *very* quick and dirty tests, but I'm still honing in
on getting my frontend to generate good IR, so that the optimizer
passes can really work their magic. There's still also a lot of
low-hanging fruit to optimize in the GC itself.

I have a basic raytracer (funny that you chose a similar test!) which
renders a shaded sphere floating over a plane, with simple shadowing
and lighting. On my current hardware, the tracer runs at 600x600
pixels and roughly 19 frames per second (~52 ms per frame) with
garbage collection fully enabled. A comparable implementation in C++
runs at about 23 FPS, so something like a 16% difference in speed.
Keep in mind that the Epoch language implementation has a lot of work
yet to go - I'm fully confident I can narrow that margin appreciably
by introducing key features to the language like greater control over
memory layout and locality of reference.

I also want to build in some heuristics to the Epoch runtime to avoid
triggering GC stack/heap traversals as often as they are currently
done. That should gain a reasonable amount of performance by itself.
Implementing a moving/compacting collector and some other goodies
should go even further (since I can do cheap allocations instead of
basically relying on malloc under the hood), and once the
optimizations are really cranking, I think I can get to within a
couple percent of my Visual C++ build of the raytracer.

I'm glad to hear that you solved the issues you described in your last
mail. How did you solve them? Was that because of the stack-coloring bug?

As it turns out almost all of my headaches were caused by the
stack-coloring issue. The basic approach described on the Epoch wiki
page seems to work fine as long as stack slots are not merged.

Where do the negative stack offsets come from? The stack root walking code
in my project doesn't need to check for negative stack offsets.

This is interesting to hear... all I know is that LLVM hands me
negative offsets for some emitted functions. I have not yet
ascertained the cause for this.

What are bookmarks?

Bookmarks are a hack to get around FPO optimizations. Suppose we have
a call stack that looks like this:

Epoch function A [FPO disabled]
External function 1 [FPO enabled]
External function 2 [FPO enabled]
Epoch function B [FPO disabled]
Epoch function C [FPO disabled]

If we trigger GC in function A, we have to walk the stack, but FPO can
potentially cause the two stack frames for the external code to
"vanish" - depending on how the optimization is performed, we might
end up getting a frame pointer that points into C instead of B. The
bookmark system just records that we "exited" the Epoch-controlled
stack frame in function B, during the invocation of External Function
2. When A goes to run GC, it verifies that B's stack frame is walked
regardless of where the FPO-mangled stack leads us. Since B has FPO
disabled, we know it will walk correctly into C's stack frame.

What's the purpose of the TriggerGarbageCollection function?

For verification, the Epoch GC doesn't behave like a standard
production GC, that is, it will collect aggressively instead of only
when under memory pressure, or generationally, etc.
TriggerGarbageCollection is a simple shim invoked from JITted Epoch
code which causes a collection cycle to occur.

I also use this shim to capture the current stack pointer and hand it
off to the garbage worker routines. This is the best way I could think
of to capture the stack pointer accurately without relying on weird
hacks like taking the address of a local variable, etc.

Have you considered using a hash map in GarbageWorker() that maps safe
points to lists of GCRoots? I think this would simplify the code and reduce
the overhead of stack root walking.

Yep! This is in fact one of the next optimizations I plan on tackling.
It should be a solid win both in terms of code cleanliness and
performance, as you observed.

Are you interested in improving the code generator's support for garbage
collection? I posted some thoughts in a mail with the Subject "New ideas
about how to improve garbage collection support". That would also solve the
stack-coloring bug.

I'd love to work on getting GC faster and more accessible, but I'm
afraid the internals of the code generator are still a bit out of
scope of my personal project (which is the development of a novel
language). Since I only have so much free time to spare, it's a tough
call whether to invest that time in LLVM or in my own project.

That said, I would be more than happy to contribute back to the LLVM
project as soon as I am confident that I have something of value to
offer. For now the best I can provide is some guidance on how to use
the existing facilities; I'd need to spend a lot more time in the LLVM
core to feel comfortable suggesting changes there.

- Mike