getting started with IR needing GC

Howdy do LLVM folks!

I've exhausted what I can do on my own to make a GC example bind (usual googling, reading, playing, looking at source). I can't find the shadow collector lib or perhaps the -l options needed to link my sample (not even to point where I'm figuring out GC actually as I can't link). Not sure this IR is correct but here is what I've been playing with (gc.ll):

declare void @llvm.gcroot(i8 **, i8*)
declare void @llvm_gc_collect()

define void @foo() gc "shadow-stack" {
; int *pa = malloc(sizeof(int));
     %a = malloc i32
     %pa = alloca i32*
     store i32* %a, i32** %pa

     %c = bitcast i32** %pa to i8**
     call void @llvm.gcroot(i8** %c, i8* null)
; *pa = 99;
     %t0 = add i32 99,0
     %t1 = load i32** %pa
     ;%t2 = getelementptr i32** %t1, i32 0
     store i32 %t0, i32* %t1

     store i32* null, i32** %pa; say it's dead
     ret void
}

define void @main() {
     call void @foo()
     call void @llvm_gc_collect()
     ret void
}

Naturally I find ShadowStackCollector.cpp but even crazy links like this:

gcc -lLLVMLinker -lLLVMipo /usr/local/lib/LLVMInterpreter.o -lLLVMInstrumentation /usr/local/lib/LLVMExecutionEngine.o /usr/local/lib/LLVMJIT.o -lLLVMDebugger -lLLVMBitWriter /usr/local/lib/LLVMX86.o -lLLVMSelectionDAG -lLLVMCodeGen -lLLVMScalarOpts -lLLVMTransformUtils -lLLVMipa -lLLVMAsmParser -lLLVMArchive -lLLVMBitReader -lLLVMAnalysis -lLLVMTarget -lLLVMCore -lLLVMSupport -lLLVMSystem /usr/local/llvm-2.2/lib/CodeGen/Release/ShadowStackCollector.o gc.s

don't find the llvm_gc_collect. Tried my best to do a strings on all libs to find...must be out of C++ practice. Clearly I'm including the compiler not runtime stuff above, but trying everything.

Can anybody help a guy out?

Terence

Hi Terence,

I've exhausted what I can do on my own to make a GC example bind (usual googling, reading, playing, looking at source). I can't find the shadow collector lib or perhaps the -l options needed to link my sample (not even to point where I'm figuring out GC actually as I can't link).

The shadow stack walker is in the runtime directory with the semispace heap example. The runtime directory is built to LLVM IR using llvm-gcc. So it's skipped unless you configure llvm with llvm-gcc support.

Since the semispace heap doesn't actually work (it's an example, at best), I suggest you simply copy the stack visitor into your project; it's only a dozen lines of code or so.

    %a = malloc i32
    %pa = alloca i32*
    store i32* %a, i32** %pa

    %c = bitcast i32** %pa to i8**
    call void @llvm.gcroot(i8** %c, i8* null); *pa = 99;

Note that the malloc instruction always allocates from the system heap, not your managed heap; putting a malloc pointer into a GC pointer will probably confuse your collector. So you'll likely need to replace 'malloc i32' with some call into your own allocator.

Your allocator should probably bzero the memory before returning it; malloc returns uninitialized memory, which will crash the collector if you reach a collection point before completely initializing the object.

Hope that helps,
Gordon

The shadow stack walker is in the runtime directory with the semispace
heap example. The runtime directory is built to LLVM IR using llvm-
gcc. So it’s skipped unless you configure llvm with llvm-gcc support.

doh! That’s how I missed the binary. thanks!

Since the semispace heap doesn’t actually work (it’s an example, at
best), I suggest you simply copy the stack visitor into your project;
it’s only a dozen lines of code or so.

Ok, copying; can’t find ShadowStackEntry though. Even make in that dir doesn’t work:

/usr/local/llvm-2.2/runtime/GC/SemiSpace $ sudo make
Password:
llvm[0]: Compiling semispace.c for Release build (bytecode)
semispace.c:107: error: expected specifier-qualifier-list before ‘ShadowStackEntry’
semispace.c:111: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘*’ token
semispace.c: In function ‘llvm_cg_walk_gcroots’:
semispace.c:114: error: ‘StackEntry’ undeclared (first use in this function)
semispace.c:114: error: (Each undeclared identifier is reported only once
semispace.c:114: error: for each function it appears in.)
semispace.c:114: error: ‘R’ undeclared (first use in this function)
make: *** [/usr/local/llvm-2.2/runtime/GC/SemiSpace/Release/semispace.ll] Error 1

It seems like it could be StackEntry instead? Perhaps this is a type I must include / generate for my type system?

%a = malloc i32
%pa = alloca i32*
store i32* %a, i32** %pa

%c = bitcast i32** %pa to i8**
call void @llvm.gcroot(i8** %c, i8* null); *pa = 99;

Note that the malloc instruction always allocates from the system
heap, not your managed heap; putting a malloc pointer into a GC
pointer will probably confuse your collector. So you’ll likely need to
replace ‘malloc i32’ with some call into your own allocator.

Yep, was going to get to that once I could bind; was trying one GC thing at a time. :slight_smile:

Your allocator should probably bzero the memory before returning it;
malloc returns uninitialized memory, which will crash the collector if
you reach a collection point before completely initializing the object.

Will do that too :slight_smile:

Got a simple, complete t.ll file that works with the semispace thing? I could reproduce stuff from the shadowstack paper I guess. how does the gc “shadow-stack” gcroot intrinsic work exactly? I couldn’t read the assembly very well. Seems my example above wouldn’t work would it unless i create/fill in a shadow stack record?

Taking a giant step back, I can build something similar to semispace.c myself so I’m in control of my world, right? i would set up the shadow stack using IR instructions and could avoid gcroot by notifying my collector as I see fit…

Sorry I’m so lost…just trying to figure out what llvm does for me and what I have to do.

Ter

Since the semispace heap doesn’t actually work (it’s an example, at best), I suggest you simply copy the stack visitor into your project; it’s only a dozen lines of code or so.

Ok, copying; can’t find ShadowStackEntry though. Even make in that dir doesn’t work:

Please use the version from subversion; this is broken in 2.2 release, unfortunately.

how does the gc “shadow-stack” gcroot intrinsic work exactly? I couldn’t read the assembly very well. Seems my example above wouldn’t work would it unless i create/fill in a shadow stack record?

‘gc “shadow-stack”’ in the LLVM IR instructs the code generator to automatically maintain the linked list of stack frames. You don’t have to do anything to maintain these shadow stack frames except to keep your variables in the llvm.gcroot’d allocas. Essentially, it does this:

struct ShadowStackEntry {
ShadowStackLink *next;
const ShadowStackMetadata *metadata;
void *roots[0];
};

template <size_t count>
struct Roots {
ShadowStackLink *next;
const ShadowStackMetadata *metadata;
void *roots[0];
};

ShadowStackEntry *shadowStackHead;

// Defined by the code generator.
const ShadowStackMetadata f_metadata = …;

void f() {
Roots<3> roots;
roots.next = shadowStackHead;
roots.metadata = f_metadata;
roots.roots[0] = NULL;
roots.roots[1] = NULL;
roots.roots[2] = NULL;
shadowStackHead = (ShadowStackEntry *) &roots;

… user code …

shadowStackHead = entry.next; // before any exit
return;
}

Taking a giant step back, I can build something similar to semispace.c myself so I’m in control of my world, right? i would set up the shadow stack using IR instructions and could avoid gcroot by notifying my collector as I see fit…

That’s true; the shadow stack design is explicitly for uncooperative environments, after all.

When you want to eliminate the shadow stack overhead, you will need to (a.) use a conservative GC or (b.) emit stack frame metadata using the LLVM GC support.

Sorry I’m so lost…just trying to figure out what llvm does for me and what I have to do.

No problem!

Generally speaking, LLVM is going to help you find roots on the stack, which is the part that the compiler backend must help with; the rest is your playground. The infrastructure is more suited toward interfacing with an existing GC rather than necessarily making writing a new runtime trivial. (See exception handling for precedent…)

— Gordon

Since the semispace heap doesn’t actually work (it’s an example, at best), I suggest you simply copy the stack visitor into your project; it’s only a dozen lines of code or so.

Ok, copying; can’t find ShadowStackEntry though. Even make in that dir doesn’t work:

Please use the version from subversion; this is broken in 2.2 release, unfortunately.

ah! ok, looks better now. :slight_smile:

how does the gc “shadow-stack” gcroot intrinsic work exactly? I couldn’t read the assembly very well. Seems my example above wouldn’t work would it unless i create/fill in a shadow stack record?

‘gc “shadow-stack”’ in the LLVM IR instructs the code generator to automatically maintain the linked list of stack frames. You don’t have to do anything to maintain these shadow stack frames except to keep your variables in the llvm.gcroot’d allocas. Essentially, it does this:

struct ShadowStackEntry {
ShadowStackLink *next;
const ShadowStackMetadata *metadata;
void *roots[0];
};

Ok, bear with me here…

What’s the difference between ShadowStackLink and ShadowStackEntry?

template <size_t count>
struct Roots {
ShadowStackLink *next;
const ShadowStackMetadata *metadata;
void *roots[0];
};

ShadowStackEntry *shadowStackHead;

// Defined by the code generator.
const ShadowStackMetadata f_metadata = …;

Do you mean generated by my front end that emits IR or do you mean the backend? It seems that, since I read the source code and build the symbol table, I would need to build this stack frame type information for LLVM.

void f() {
Roots<3> roots;
roots.next = shadowStackHead;
roots.metadata = f_metadata;
roots.roots[0] = NULL;
roots.roots[1] = NULL;
roots.roots[2] = NULL;

What are the three roots here? Not sure where anything but the next, metadata are coming from. So the gc “shadow-stack” generates that preamble code? That would make sense

shadowStackHead = (ShadowStackEntry *) &roots;

… user code …

here is where my gcroots go then I guess.

shadowStackHead = entry.next; // before any exit
return;
}

Can you tell me where to find ShadowStackMetadata? A search does not reveal it:

/usr/local/llvm-2.2 $ find . -name ‘ShadowStackMetadata*’

Taking a giant step back, I can build something similar to semispace.c myself so I’m in control of my world, right? i would set up the shadow stack using IR instructions and could avoid gcroot by notifying my collector as I see fit…

That’s true; the shadow stack design is explicitly for uncooperative environments, after all.

The compiler plug-in for a GC is like a sophisticated macro that knows how to emit preambles and post ambles for each function that says it uses that particular GC, right? Does it do more than an include such as figuring out which alloca’s I have that are pointers? If so, then why do I need to use gcroot instructions to identify roots? Seems like it would be much easier to understand to just have my output templates emit the preamble and so on. Oh, maybe the optimizer remove some stuff in there for what I think is a root is actually not around anymore.

When you want to eliminate the shadow stack overhead, you will need to (a.) use a conservative GC or (b.) emit stack frame metadata using the LLVM GC support.

Unfortunately, I’m thoroughly confused about who generates what. Who is supposed to generate the meta data types? If I am, that is fine, but I really can’t find anything in the documentation that is a simple end to end C code → IR example. Once I get one together, I’ll put it in the book I’m writing. I’ve spent many hours reading and playing as much as I can, but it is still not clear; 'course I ain’t always that bright. :wink: Note that the paper by Henderson was extremely clear to me, so it’s not the contents, it is the details of using LLVM to do GC.

Sorry I’m so lost…just trying to figure out what llvm does for me and what I have to do.

No problem!

Generally speaking, LLVM is going to help you find roots on the stack, which is the part that the compiler backend must help with; the rest is your playground.

Is that because only code generation knows what roots exist after processing the IR?

The infrastructure is more suited toward interfacing with an existing GC rather than necessarily making writing a new runtime trivial. (See exception handling for precedent…)

Well, writing a new garbage collector seems really straightforward (like to mark and sweep). LLVM will give me the roots and I am free to walk them. The part that I don’t understand is who defines what metadata types and how exactly I make use of gcroot and LLVM’s support. The concepts are clear, the details seem miles away :wink:

Thanks for all the help…

Has anybody else on the list gotten a trivial GC’d language working I could look at? All go back to the scheme translator again to see what I can learn.

Thanks,
Ter

Hi Terence,

I think you’re getting hung up on the details of the shadow stack collector. The shadow stack is a GC that is possible within this framework, but of course could be implemented without any special support. Its presence is more misleading than anything else. Taking a step back, the concepts are:

llvm.gcroot instructs the code generator → My GC needs to be able to find this variable on the stack at runtime.
gc “x” instructs the code generator → My GC will use the method “x” to find variables on the stack at runtime.

Typically, a metadata table (similar to exception handling tables) will be emitted. Its structure is fundamentally map<void*,list>, where the key is the address of a safe point in code (for instance, the address following a ‘call’ instruction), and the ints indicate the offsets of the live roots within the stack frame. LLVM does not define such a data structure, but the runtime-specific Collector subclass and the GC runtime itself must agree to a common format in order to interoperate.

ShadowStackCollector uses a completely alternative methodology. A shadow stack could be implemented in user code without any special support. (Indeed, it is implemented entirely as an LLVM IR → LLVM IR transformation!)

how does the gc “shadow-stack” gcroot intrinsic work exactly? I couldn’t read the assembly very well. Seems my example above wouldn’t work would it unless i create/fill in a shadow stack record?

‘gc “shadow-stack”’ in the LLVM IR instructs the code generator to automatically maintain the linked list of stack frames. You don’t have to do anything to maintain these shadow stack frames except to keep your variables in the llvm.gcroot’d allocas. Essentially, it does this:

struct ShadowStackEntry {
ShadowStackLink *next;
const ShadowStackMetadata *metadata;
void *roots[0];
};

Ok, bear with me here…

What’s the difference between ShadowStackLink and ShadowStackEntry?

This is an abstract type with a flexible array member.

template <size_t count>
struct Roots {
ShadowStackLink *next;
const ShadowStackMetadata *metadata;
void *roots[0];
};

This should be roots[n], which makes the difference.

ShadowStackEntry *shadowStackHead;

// Defined by the code generator.
const ShadowStackMetadata f_metadata = …;

Do you mean generated by my front end that emits IR or do you mean the backend? It seems that, since I read the source code and build the symbol table, I would need to build this stack frame type information for LLVM.

No, the code generator injects this constant. The metadata records how many roots are in the entry, and also stores the ‘metadata’ parameter to llvm.gcroot if you provide one.

void f() {
Roots<3> roots;
roots.next = shadowStackHead;
roots.metadata = f_metadata;
roots.roots[0] = NULL;
roots.roots[1] = NULL;
roots.roots[2] = NULL;

What are the three roots here? Not sure where anything but the next, metadata are coming from. So the gc “shadow-stack” generates that preamble code? That would make sense

These would correspond to three gcroot allocas.

shadowStackHead = (ShadowStackEntry *) &roots;

… user code …

here is where my gcroots go then I guess.

This is where your uses of them would occur. llvm.gcroot does not emit any code except the preamble.

shadowStackHead = entry.next; // before any exit
return;
}

Can you tell me where to find ShadowStackMetadata? A search does not reveal it:

/usr/local/llvm-2.2 $ find . -name ‘ShadowStackMetadata*’

This was pseudocode. The ShadowStackCollector actually does instantiate such llvm StructTypes, however; you can refer to its implementation.

Taking a giant step back, I can build something similar to semispace.c myself so I’m in control of my world, right? i would set up the shadow stack using IR instructions and could avoid gcroot by notifying my collector as I see fit…

That’s true; the shadow stack design is explicitly for uncooperative environments, after all.

The compiler plug-in for a GC is like a sophisticated macro that knows how to emit preambles and post ambles for each function that says it uses that particular GC, right? Does it do more than an include such as figuring out which alloca’s I have that are pointers? If so, then why do I need to use gcroot instructions to identify roots? Seems like it would be much easier to understand to just have my output templates emit the preamble and so on. Oh, maybe the optimizer remove some stuff in there for what I think is a root is actually not around anymore.

The shadow stack is a primitive form of GC with high runtime overhead. It’s an easy way to bring up a new collector, and it is highly portable, but it’s slow and not multithread-capable. This table is probably the best summary of the benefits of the GC infrastructure:

http://llvm.org/docs/GarbageCollection.html#collector-algos

When you want to eliminate the shadow stack overhead, you will need to (a.) use a conservative GC or (b.) emit stack frame metadata using the LLVM GC support.

Unfortunately, I’m thoroughly confused about who generates what. Who is supposed to generate the meta data types?

Your code is responsible for defining the object model metadata (vtable pointers, class layouts, etc.).

For enumerating stack roots, the Collector plugin and the runtime must agree to a common format.

If I am, that is fine, but I really can’t find anything in the documentation that is a simple end to end C code → IR example. Once I get one together, I’ll put it in the book I’m writing. I’ve spent many hours reading and playing as much as I can, but it is still not clear; 'course I ain’t always that bright. :wink: Note that the paper by Henderson was extremely clear to me, so it’s not the contents, it is the details of using LLVM to do GC.

Sorry I’m so lost…just trying to figure out what llvm does for me and what I have to do.

No problem!

Generally speaking, LLVM is going to help you find roots on the stack, which is the part that the compiler backend must help with; the rest is your playground.

Is that because only code generation knows what roots exist after processing the IR?

Only the code generator knows where the roots are after code generation (stack offsets, registers), or at what points the roots must be findable (at safe points).

It is of course possible to define user data structures to do this (this is what the shadow stack does), but that incurs significant overhead (4 + n stores + 2 loads per function call).

The infrastructure is more suited toward interfacing with an existing GC rather than necessarily making writing a new runtime trivial. (See exception handling for precedent…)

Well, writing a new garbage collector seems really straightforward (like to mark and sweep). LLVM will give me the roots and I am free to walk them. The part that I don’t understand is who defines what metadata types and how exactly I make use of gcroot and LLVM’s support. The concepts are clear, the details seem miles away :wink:

Thanks for all the help…

Has anybody else on the list gotten a trivial GC’d language working I could look at? All go back to the scheme translator again to see what I can learn.

PyPy’s used this, but LLVM isn’t their best code generator. There are private projects as well.

— Gordon

Hi guys,

I am also trying to understand the garbage collection framework. I've
built up a tiny frontend for a very simple language, and I'd like to
try adding a simple garbage collector. I guess I'll walk through my
understanding of the required steps - I'd appreciate it if you guys
could correct where I misunderstand and help fill in the gaps.

Currently, my frontend creates a normal LLVM IR of a program. Before
adding GC, all variables are allocated by doing an alloca or malloc,
then by doing a store of the alloca to the variable.

Now let's assume I want to add a simple copying garbage collector.

I write an implementation called copycollector.c (based on
semispace.c) that implements the functions declared in GCInterface.h.

When my frontend generates LLVM IR code for a program, it should also
generate a call (early in the IR code) to llvm_gc_initialize. This
function uses the system calloc to allocate two big blocks of memory,
then stores pointers to that memory in static variables.

Since this is a simple copying collector, the functions llvm_gc_read
and llvm_gc_write won't really do much:
void *llvm_gc_read(void *ObjPtr, void **FieldPtr) { return *FieldPtr; }
void llvm_gc_write(void *V, void *ObjPtr, void **FieldPtr) { *FieldPtr = V; }

There is also a function called llvm_gc_allocate. Now, instead of
using alloca or malloc, my frontend generates a call to
llvm_gc_allocate.

Up to this point I feel pretty good about what's going on. But now
things get a bit fuzzy.

I somehow need to inform the garbage collection runtime (my
copycollector.c) about my variables - specifically about gc roots. So,
after I get new memory using llvm_gc_initialize, I think I should
generate an @llvm.gcroot intrinsic. But what do I pass it? The
@llvm.gcroot intrinsic takes an i8**, but llvm_gc_initialize returns a
void *. I don't really understand what happens when I generate an
@llvm.gcroot intrinsic. Conceptually, I'm announcing that I have a
variable that could serve as a GC root. But what exactly is happening?
And do I need to care what's happening?

Now, back to the collector runtime... When llvm_gc_collect is called,
I need to walk through the GC space, starting with the GC roots. How
do I get access to the GC roots?

Then there is the collector plugin. I guess there will be a
MyCustomCollector which inherits from Collector . I really don't
understand what this class does and why I need it. Let's say I have a
simple copying GC, and let's say I have a very simple set of types in
my language. When I store a variable in GC space, I'll store a char at
the beginning of the memory for that variable that simply indicates
the type (and I'll be able to infer the size from the type). Given
that, I assume I won't ever need use the collector metadata (in
@llvm.gcroot, llvm_cg_walk_gcroots, or via CollectorMetadata). Is that
right?

What does it mean for the plugin to compute a stack map?

When are the methods of the plugin called? Who is responsible for calling them?

When is the constructor of the plugin called?

What is the relationship between the plugin and the GC runtime?

In the GC docs, section "Tracing GC pointers from heap objects", there
is the following statement: "In addition, LLVM allows the front-end to
extract GC information in any form from a specific object pointer
(this supports situations #1 and #3)." Is there an example where this
is done? What API can I look at to see how to do this?

Thanks for all the help. I know this is a lot of questions. I'd really
like to get a basic GC up and running.

Thanks,
Lane

Hi guys,

Hi Lane!

This is a lot of questions. I'm not going to answer each individually, but will instead give general guidance to help you avoid the pain points…

I somehow need to inform the garbage collection runtime (my copycollector.c) about my variables - specifically about gc roots. So, after I get new memory using llvm_gc_initialize, I think I should generate an @llvm.gcroot intrinsic.

This is correct.

Think of the llvm.gcroot intrinsic call as an annotation on a stack variable (an alloca). Like noalias, it doesn't generate any code itself. Rather, it instructs the code generator to keep track of the location of the variable in the stack frame. You must store your allocated pointer into this variable. Since you are using a copying collector, you must also reload the value of this variable before each use. This is to guard against the possibility that the collector has been invoked, updating the contents of the alloca.

As for the compiler plugin interface, I suggest you ignore it initially and use the provided shadow-stack option for the time being. The shadow stack generates significantly suboptimal code, but will let you avoid writing some platform-specific code. Instead, simply copy the llvm_cg_walk_gcroots routine from the semispace example. Call it from your collection routine in order to visit each gcroot on the runtime stack.

The shadow stack lets you find roots by automatically emitting a function prologue and an epilogue to push and pop gcroot entries on a stack as the program runs. This avoids the need to traverse the native call stack. If your language becomes sufficiently performance sensitive, you can later investigate implementing a custom Collector plugin.

A few additional notes:

When my frontend generates LLVM IR code for a program, it should also
generate a call (early in the IR code) to llvm_gc_initialize. This
function uses the system calloc to allocate two big blocks of memory,
then stores pointers to that memory in static variables.
[...]
There is also a function called llvm_gc_allocate. Now, instead of
using alloca or malloc, my frontend generates a call to
llvm_gc_allocate.

These function names are entirely optional. Your runtime can use any names and prototypes it likes to provide this functionality. A bump-ptr allocator might easily inline part of its allocation routine.

Since this is a simple copying collector, the functions llvm_gc_read
and llvm_gc_write won't really do much:
void *llvm_gc_read(void *ObjPtr, void **FieldPtr) { return *FieldPtr; }
void llvm_gc_write(void *V, void *ObjPtr, void **FieldPtr) { *FieldPtr = V; }

You can just emit loads and stores directly if your read/write barriers do nothing. Also, there's nothing special about the llvm_gc_read or llvm_gc_write functions any more; they will not be called unless you call them yourself.

— Gordon

Gordon:

Since GC strategies change as implementations mature, is there a way to
go ahead and emit these calls, but then in a simple implementation
indicate somewhere that they should be trivially inlined? This would
allow us to debug things without imposing performance penalties on the
simple implementation.

Actually, I can see wanting to inline these even in barrier-oriented GC
implementations...

shap

Since this is a simple copying collector, the functions llvm_gc_read and llvm_gc_write won’t really do much:

void *llvm_gc_read(void *ObjPtr, void **FieldPtr) { return *FieldPtr; }

void llvm_gc_write(void *V, void *ObjPtr, void **FieldPtr) { *FieldPtr = V; }

You can just emit loads and stores directly if your read/write barriers do nothing. Also, there’s nothing special about the llvm_gc_read or llvm_gc_write functions any more; they will not be called unless you call them yourself.

Here, I was making two points. First and foremost, the user symbols llvm_gc_read and llvm_gc_write are not special any more, so Lane shouldn’t bother with them. :slight_smile:

Secondly, it’s not strictly necessary to use the barrier intrinsics, which is the point you’re addressing.

Gordon:

Since GC strategies change as implementations mature, is there a way to go ahead and emit these calls, but then in a simple implementation indicate somewhere that they should be trivially inlined? This would allow us to debug things without imposing performance penalties on the simple implementation.

Actually, I can see wanting to inline these even in barrier-oriented GC implementations…

Indeed so! If you do use the @llvm.gcread and @llvm.gcwrite intrinsics (not to be confused with the above functions), they will be transparently converted to simple loads and stores, so there’s “no cost” to using them. As the implementation matures, an llc plugin can trivially substitute a write barrier using the performCustomLowering hook:

http://llvm.org/docs/GarbageCollection.html#custom

This gives quite detailed control. The plugin could inline just a “hot” codepath on a barrier and keep a cold one out of line, for instance. (Note: Near-future plan is to use IR-to-selection-DAG lowering rather than IR-to-IR for this hook, if you’re working in this area.)

However, GC lowering happens just before code generation, after any IR optimizations; thus, the use of the intrinsics can inhibit IR analysis and optimization. Even ignoring that the intrinsics are opaque to most optimizations, the read barrier can’t even be marked readonly. Therefore, the most optimal output is probably be achieved by propagating knowledge about the GC model into the front-end, emitting simple loads and stores when allowed by the GC.

Chris has observed that read and write barriers lowering could equally well be implemented by the front-end with no special compiler support. The only significant value these these intrinsics add is if they are transparent to optimizations—e.g., if @llvm.gcreads of the same SSA value are coalesced even though @llvm.gcread can have side effects. This is an area for future work.

— Gordon

Given the current infrastructure, I concur, but the appropriate
selection of GC is target and application dependent, so one does not
want to bind this decision too tightly. Embedding knowledge in the front
end is a very tight binding indeed.

It isn't an immediate problem for us, and one fine solution would be for
us to move the lowering into the front end down the road as Chris has
suggested. I asked mainly because if there was a way to dodge the
premature decision binding problem up front, it seemed worthwhile to do
so.

shap

OK. This is helpful in trying to understand what the Collector plugin is.

So is it correct then that the Collector plugin is the GC's view into
the backend? In other words, most garbage collectors have to have some
knowledge of how the runtime stack is actually laid out. ShadowStack
skirts around this issue by maintaining a "shadow" stack, so when the
GC needs info about the runtime stack, ShadowStack instead provides
info about the "shadow" stack. But most collector plugins would
instead have to know about the actual layout of the runtime stack. Is
that right?

If so, then a Collector plugin would need to have info about every
supported backend lays out the runtime stack?

Thanks,
Lane

As for the compiler plugin interface, I suggest you ignore it initially and use the provided shadow-stack option for the time being. The shadow stack generates significantly suboptimal code, but will let you avoid writing some platform-specific code. Instead, simply copy the llvm_cg_walk_gcroots routine from the semispace example. Call it from your collection routine in order to visit each gcroot on the runtime stack.

The shadow stack lets you find roots by automatically emitting a function prologue and an epilogue to push and pop gcroot entries on a stack as the program runs. This avoids the need to traverse the native call stack. If your language becomes sufficiently performance sensitive, you can later investigate implementing a custom Collector plugin.

OK. This is helpful in trying to understand what the Collector plugin is.

So is it correct then that the Collector plugin is the GC's view into the backend? In other words, most garbage collectors have to have some knowledge of how the runtime stack is actually laid out. ShadowStack skirts around this issue by maintaining a "shadow" stack, so when the GC needs info about the runtime stack, ShadowStack instead provides info about the "shadow" stack. But most collector plugins would instead have to know about the actual layout of the runtime stack. Is that right?

That's correct.

If so, then a Collector plugin would need to have info about every supported backend lays out the runtime stack?

Yes. This information is actually available in a target-independent fashion with LLVM, so the Collector interface is target-independent. A backend that doesn't use the target-independent back-end could also implement support for values of gc "...", but this is quite hypothetical.

Such a runtime will further need a way to crawl the native call stack and discover the return address of each call frame. LLVM doesn't provide such a facility.

— Gordon

> If so, then a Collector plugin would need to have info about every
> supported backend lays out the runtime stack?

Yes. This information is actually available in a target-independent
fashion with LLVM, so the Collector interface is target-independent. A
backend that doesn't use the target-independent back-end could also
implement support for values of gc "...", but this is quite
hypothetical.

Right. So a GC plugin needs to know a few things in order to find GC
roots. It needs to know the stack pointer, the frame pointer, possibly
static links (if we allow nested functions), and where in the stack
frame local variables live.

How do you get access to this data in a platform-agnostic manner?

Such a runtime will further need a way to crawl the native call stack
and discover the return address of each call frame. LLVM doesn't
provide such a facility.

I guess I'm missing something here. Why does the GC need the return addresses?

Thanks,
Lane

If so, then a Collector plugin would need to have info about every supported backend lays out the runtime stack?

Yes. This information is actually available in a target-independent fashion with LLVM, so the Collector interface is target-independent. A backend that doesn't use the target-independent back-end could also implement support for values of gc "...", but this is quite hypothetical.

Right. So a GC plugin needs to know a few things in order to find GC roots. It needs to know the stack pointer, the frame pointer, possibly static links (if we allow nested functions), and where in the stack frame local variables live.

How do you get access to this data in a platform-agnostic manner?

http://llvm.org/docs/GarbageCollection.html#stack-map

Such a runtime will further need a way to crawl the native call stack and discover the return address of each call frame. LLVM doesn't provide such a facility.

I guess I'm missing something here. Why does the GC need the return addresses?

Return addresses are directly discoverable from the stack; function entry points are not.

— Gordon

>
>>> If so, then a Collector plugin would need to have info about every
>>> supported backend lays out the runtime stack?
>>
>> Yes. This information is actually available in a target-independent
>> fashion with LLVM, so the Collector interface is target-
>> independent. A backend that doesn't use the target-independent back-
>> end could also implement support for values of gc "...", but this
>> is quite hypothetical.
>
> Right. So a GC plugin needs to know a few things in order to find GC
> roots. It needs to know the stack pointer, the frame pointer,
> possibly static links (if we allow nested functions), and where in
> the stack frame local variables live.
>
> How do you get access to this data in a platform-agnostic manner?

Garbage Collection with LLVM — LLVM 16.0.0git documentation

Good - I'd seen that earlier, but in this context it makes more sense.

>> Such a runtime will further need a way to crawl the native call
>> stack and discover the return address of each call frame. LLVM
>> doesn't provide such a facility.
>
> I guess I'm missing something here. Why does the GC need the return
> addresses?

Return addresses are directly discoverable from the stack; function
entry points are not.

If you have the stack, and can therefore determine all live GC roots,
why do you need function entry points? Is it so you know *when* you
can safely run garbage collection?

Sorry if I'm a bit dense on this point. I thought I'd read up on the
topic, but obviously I missed something here.

Thanks for your patience,
Lane

Lane:

Having the stack map doesn't give you liveness information. If you know
where you are in the control flow of a procedure, you may be able to
know that certain pointer values in the corresponding stack frame are
live and others are not. Tracing non-live pointers can lead to fairly
dramatic increases in retained garbage. Partly for this reason there has
been a bunch of work done on optimization passes in GC'd languages to
explicitly nullify pointer values explicitly at any point where they
cease to be live. A flow-oblivious GC also causes procedure frame growth
because pointer temporaries and data temporaries cannot be stored in the
same location in the frame (because we need to know, for each location,
whether it is pointer or data). In a non-GC language, those could reuse
the same cell in the frame as long as the two values do not have
overlapping liveness.

More importantly, if you don't know where the procedure is in its
control flow, then you cannot know which callee-saved registers held
pointers at the time of the call, so you can't do any relocating GC if
pointers are permitted to be live in registers at a call site. This has
implications for register allocation. A naive-GC-friendly
registerization scheme, given a choice, should prefer to keep pointers
in caller-saved registers to avoid additional spill code. I don't know
whether the current LLVM register allocator does this (Gordon?). If
hardware register selection considers the load/save cost as part of its
registerization criteria (Gordon?), LLVM may end up solving this
implicitly as a side effect of considering the reload costs.

As I understand matters, LLVM does not presently provide sufficient
information to reconstruct register values as you walk the stack. This
implies (and here I *really* hope that Gordon will straighten me out)
that if your implementation uses a relocating GC strategy you need to
ensure in your IR that you reload any registerized pointer values after
any procedure call that might (transitively) call GC(). Conservatively
this tends to devolve into "reload any pointer after any procedure
call".

Caller-saved registers are not a real problem here, because the caller
can save them to local frame cells that are of the appropriate type
(pointer vs. data), though I do not know if the LLVM register
dump/reload support is pointer/data aware (Gordon?). The problem is
callee-saved registers. The callee has no type information about the
register's current value, and so cannot save it to an appropriately
typed save slot in the callee stack frame. Consider:

  A() has pointer in callee-saved register R17
  A() calls B()
  B() needs R17, so saves it to &R17. Does not know that R17 holds
      pointer.

GC walks stack frame for B(), cannot determine whether the location at
&R17 is pointer or data. Naive GC therefore cannot relocate R17. This
induces the requirement for A() to reload R17 after return from the call
to B().

The requirement to dump and reload these registers is awful, and it is
part of why Fergus Henderson's strategy for using an embedded shadow
stack doesn't look very bad in comparison to many "true" GC systems.

A hypothetical sophisticated stack walker with access to flow-sensitive
(i.e. per-return-PC) information could do better. As the hypothetical
walker proceeds up the stack, it remembers for each callee-saved
register the location of the currently live saved location for that
register. In my example, it would remember the location where R17 was
saved by B. At the time the location is saved the walker does not know
the type of that memory cell. As it proceeds further up the stack, it
discovers when it reaches A() that the currently live R17 is a pointer,
at which point it infers that the previously recorded save location must
be of pointer type (or at least, it is so for the control flow and stack
state that we are currently walking) and then goes back and traces the
saved memory location &R17 according to what is required by the
particular GC algorithm.

I'm not sure pragmatically how one might go about this in the LLVM
framework, or even if it is possible. The knowledge required by GC here
is actually the same as what you need for a debugger to walk the call
stack and keep track of register values, so it's not insane to imagine
that the compiler must be able to emit it, but I don't know how to get
emitted into a place that the stack walker is able to use.

Gordon: assume, for a moment, that I'm willing to go the extra several
miles to do this type if thing in my GC implementation. How might I go
about getting per-call-site register typings, register save locations,
and return-pc information extracted for use by this hypothetical
sophisticated walker?

shap

Gordon: assume, for a moment, that I’m willing to go the extra several miles to do this type if thing in my GC implementation. How might I go about getting per-call-site register typings, register save locations, and return-pc information extracted for use by this hypothetical sophisticated walker?

There are two complementary interfaces here, which mesh neatly into the existing collection of GC algorithms exposed by Collector.

The first is liveness analysis, for stack roots as well as register maps. The Collector API is set up for this, although it is not yet implemented.

The second is register maps (which of course are necessarily liveness accurate). The Collector API is not presently set up for this, but I expect the API additions would be rather simple:

MyCollector() {

  • // Please don’t spill roots to stack slots at safe points.
  • RegisterRoots = true;
    }

void MyCollector::finishAssembly(std::ostream &OS, AsmPrinter &AP,
const TargetAsmInfo &TAI) {
// Iterate functions.
for (iterator I = begin(), E = end(); I != E; ++I) {
// Iterate safe points.
for (CollectorMetadata::iterator PI = MD.begin(),
PE = MD.end(); PI != PE; ++PI) {
// Live stack roots.
for (CollectorMetadata::live_iterator LI = MD.live_begin(PI),
LE = MD.live_end(PI);
LI != LE; ++LI) {
// Print its offset within the stack frame.
AP.EmitInt32(LI->StackOffset);
AP.EOL(“stack offset”);
}

  • for (CollectorMetadata::reg_iterator RI = MD.reg_begin(PI),
  • RE = MD.reg_end(PI);
  • RI != RE; ++RI) {
  • int TargetRegNum = RI->Register;
  • Constant *RootMetadata = RI->Metadata;
  • }
    }
    }

Both of these require a more sophisticated SelectionDAG representation of GC pointers. I’m still mulling this representation over, but I think the derived pointer problem could be solved simultaneously.

That done, I think liveness w/ register roots would be the first feature to be enabled.

Next, liveness w/o register roots. (This is simply liveness with register roots, where each root is forcibly spilled.)

— Gordon