Garbage collector examples wanted

I am writing a runtime for a garbage collected language. Obviously this is quite a complex topic, and I am definitely at the novice end of the spectrum. I think my best approach might be to:

It would be really helpful to find an exising project or two that implement garbage collectors and make use of LLVM GCStrategy and associated LLVM features that can assist with GC.

If anyone has any links they would care to share that might be helpful in this regard, do please share them!

1 Like

Hello!

A key thing to realize when using LLVMs GC facilities is to know/realize what it provides and what it doesn’t. LLVM only provides the bare minimum of metadata and code generation that is required for one to write a GC which mainly comes down to the stack map.

The stack map enables you to find all references (pointers in address space 1) that are alive at a specific program counter in a given frame and tells you how to retrieve them (whether they are in a register or in stack memory).

When implementing a GC you can then:

  • Walk the stack
  • For every frame in the stack, find the stack map entry
  • Read all references in the frame and add it to the root set
  • Perform garbage collection
  • Optionally, perform relocation and write the new addresses of the root objects back into the frame

As far as implementation steps are concerned, you should:

  • Use address space 1 for all pointers in IR that are “references” that need to be recorded in the stackmap.
  • Schedule and run the RewriteStatepointsForGC pass to turn all call instructions into statepoint intrinsics that record all alive references in the stack map
  • At runtime: Find a way to read the stackmap section.

We have previously implemented a very simple relocating GC in a JIT LLVM envrionment in a JVM implementation:
Some useful pointers might be: GitHub - JLLVM/JLLVM: JVM implementation using LLVM as a JIT
JLLVM/src/jllvm/vm/StackMapRegistrationPlugin.cpp at main · JLLVM/JLLVM · GitHub
JLLVM/src/jllvm/vm/Runtime.cpp at main · JLLVM/JLLVM · GitHub
JLLVM/src/jllvm/gc/GarbageCollector.cpp at main · JLLVM/JLLVM · GitHub
https://www.youtube.com/watch?v=g9b_G4ao3PY&pp=ygUSSkxMVk0gbWFya3VzIGJvZWNr

4 Likes

Thank you - in terms of complexity, code size and educational value, I don’t think I could have wished for a better example than your JLLVM code, that will certainly help me a lot.

Hi, I’m working on a programming language project, and last summer I did the first version of the design and implementation of the garbage collector for my runtime. I’m not ready to make my actual garbage collector code public (I haven’t got to the point of testing it yet), but in order to understand how to get started I experimented with the statepoint mechanism in LLVM, and created a small demo to make sure I could get it to work. Perhaps you are already past that level, but in case you find it useful, you can find my demo and a description of it in this blog post:

LLVM garbage collection statepoints demo

1 Like

Hi, no I am not past the point of implementing statepoints and stack roots yet. So this is very helpful to me.

In the language I am implementing, I can actually get away with not worrying about this. Its a functional language which runs an io loop, and io operations generate events which are passed to the next io loop. This means that the stack always unwinds completely at the start/end of each io loop! So for now I can do a minor or major collection at that point, and it will work well enough for programs that don’t allocate too much in each loop!

For that reason, I pushed statepoints down my to do list and started making progress with the codegen instead.

However, it is entirely possible that there will be code that does run much bigger jobs within the io loop that require GC to be triggered by a heap nearly full condition, so this is still something I will need to tackle, and your example will be very helpful, thank you.

Currently I have a good object design for my heap. I have a nursery space with Cheney collector for minor GC, and promotion to an old gen space for survivors over N collections. The old gen space has a simple mark/sweep algorithm and free lists segregated by size. This all runs single threaded with the mutator thread taking on the role of collector thread when it needs to run GC. Extensive testing and all seems to work quite nicely.

Once I tackle stack roots and large object allocation I should have a complete system that can support any long running program.

However… since this is a fuctional language with strict evaluation there can be no cycles formed on the heap. This means that I can use the Perceus reference counting algorithm to detect “owned” objects that are being freed and re-use them immediately when the opportunity arises. This is a garbage free algorithm and I am hoping it will yield very short pause times (with worst case on ref count free chains).

I still plan to complete the full GC version though, if for no other reason than to give me something to benchmark Perceus against.

Code is not published yet, but this will be open sourced when I am ready to.

BTW I also created this AI RAG tool which helps to set up RAG over a set of files:

You need an OpenAI API key to run it.

I mention this because one of the things I did was to RAG a PDF of The Garbage Collection Handbook with it. I bought a hard copy of the book too, to ensure the author is compensated and also because the PDF is not the latest version of the book.

This was immensely helpful for exploring the topic of GC, and comparing my code against the algorithms in the book.