llvm.gcroot suggestion

I think I’m one of the few people actually using LLVM’s support for garbage collection, and so far I’ve found it very difficult to generate code that uses llvm.gcroot() correctly.

In the current scheme, it is the frontend’s responsibility to insure that any intermediate SSA values containing references to garbage collectible objects are copied to stack variables so that the GC strategy can calculate offsets for them. It is also the frontend’s responsibility to reload those values after a sync point, since the collector may have moved them around. An additional chore is creating all of the allocas for those roots, which have to be in the first block of the function (because that’s where allocas go according to my understanding of the rules of IR.)

What this means is that the frontend is forced to generate very inefficient IR, with dozens of alloca slots at the beginning of the function, each marked as a root. It also has to initialize each of these roots to a known value, even if the root is only “live” for a very small part of the CFG. This is because currently there’s no way for the frontend to tell LLVM that a given root has a limited lifespan (since calls to llvm.gcroot also have to be in the first block), and so you have to take a conservative approach and treat every root as if it were live for the entire function.

It seems to me that “gc-rootness” should not be an aspect of an alloca, as it currently is now, but rather an aspect of a Value, similar to the way llvm.dbg.value() works.

Let’s imagine a new intrinsic, llvm.gcvalue(), which takes as its arguments a Value and a metadata pointer, and returns that same value as its result. The frontend can “wrap” any value, of any type, with llvm.gcvalue(), which is a signal to LLVM that this value is a garbage collection root. It would then be the responsibility of LLVM’s code generator (or possibly some lowering pass) to insure that this value lives on the stack during sync points, and is reloaded after a sync point if it is still needed. During a sync point, these values would be treated exactly the way llvm.gcroot works today - that is, they live on the stack, and the GCStrategy gets passed a list of stack offsets and metadata pointers. The main difference is that only the values that are actually ‘live’ during the sync point are lowered from SSA values to allocas.

This approach offers a bunch of advantages that I can think of:

  • LLVM knows much more about the liveness of values than the frontend does, and could compute a much more optimal and minimal stack map for each sync point. That is, two variables whose lifetimes do not overlap can use the same stack location for their roots to be stored in, and the stack maps generated by the GCStrategy would reflect this.
  • If at some future time LLVM supports register maps for garbage collection, you would not have to update your frontend. Since we’re marking values, not stack slots, the frontend doesn’t have to care where the variable is stored, so all frontends can take advantage of improvements in the representation of stack maps.
  • Writing frontends gets a lot easier, since the frontend is no longer responsible for generating and initializing allocas for every root. It also means a lot fewer opportunities for generating incorrect code.

What do you think?

[..] the frontend is forced to generate very inefficient IR, with
dozens of alloca slots at the beginning of the function, each marked
as a root. It also has to initialize each of these roots to a known
value, even if the root is only "live" for a very small part of the
CFG. This is because currently there's no way for the frontend to tell
LLVM that a given root has a limited lifespan (since calls to
llvm.gcroot also have to be in the first block), and so you have to
take a conservative approach and treat every root as if it were live
for the entire function. [..]

I can confirm this observation. The generated code is clearly
suboptimal, and it gets even worse if one tries to do live-precise
garbage collection where the gcroot'ed alloca-slot no longer used must
be explicitly assigned a null value. On my list processing
micro-benchmarks these effects incur a performance hit of approximately
20%-30% compared to a manually crafted asm code, and that is including
the overhead of the garbage collector.

It seems to me that "gc-rootness" should not be an aspect of an
alloca, as it currently is now, but rather an aspect of a Value [..]

I'ld second that.

I can't speak to the implementation, but as a potential user of this
facility I think this approach is definitely preferable to the current
one. The IR that the Open Dylan compiler is generating LLVM code from is
almost SSA already, and I don't want to allocate stack slots for
everything just for GC maps.

-Peter S. Housel- housel@acm.org

So a few additional thoughts: I think that it would be best to keep the existing llvm.gcroot() call in addition to having llvm.gvcalue(). llvm.gcroot() would be used for marking allocas as roots, and llvm.gcvalue() would mark SSA values as roots.

In fact, if I could go back in time, I would rename them to correspond exactly with the llvm.dbg intrinsics. So in addition to llvm.dbg.declare() and llvm.dbg.value(), we’d also have llvm.gc.declare() and llvm.gc.value(), with analogous uses.

Note that it is important that the LLVM GC intrinsics should not assume that the their arguments are pointers, as they could in some cases be structs or arrays that contain pointers. LLVM should make no assumptions about the internal structure of a GC root, that is an issue for the language’s GCStrategy pass to worry about - typically the frontend will use the metadata argument to communicate the structure of a root to the GCStrategy. The only thing that LLVM has to guarantee is that during a sync point, it will be possible to associate each live root with an address in memory somewhere.

As far as implementation goes, I’m assuming that there would be some pass that converts llvm.gcvalue() intrinsics into llvm.gcroot() intrinsics by temporarily storing the the root into an alloca and then reloading it as needed. (Actually you would probably want something richer than the current llvm.gcroot that allows an explicit begin and end so that information about liveness can be preserved. How about llvm.gc.beginroot() intrinsic to indicate that a given alloca should be considered a root starting at that point in the CFG, and llvm.gc.endroot() to indicate that the specified root no longer needs to be considered a root. This is much better than the current method of assigning NULL to a root, as it accommodates non-pointer roots, and avoids the extra pointer store which is not needed in many cases.)

What is unclear to me is exactly when this should occur relative to other passes, but I think what you would want to do is perform optimization and liveness analysis first - thereby eliminating potentially dead roots before they are converted to memory locations. Unfortunately, I can’t say too much about this, as I am not a backend person - I know very little about optimization and code generation in general or LLVM’s backend passes in particular.

There would also need to be some way to tell the code generator not to reload values after a sync point if the collector doesn’t require it, such as a mark-and-sweep or other non-copying algorithm. The most straightforward way to handle this would be as an argument to the pass that does the lowering of llvm.gcvalue intrinsics.

Thinking about it even more, here’s a short summary of what I would propose:

  • llvm.gc.value(value, metadata) - marks an SSA value as a garbage collection root. This remains in effect for the lifetime of the SSA value.
  • llvm.gc.declare(alloca, metadata) - marks an alloca as a garbage collection root. This intrinsic tells LLVM that it should start treating the alloca as a GC root from that point in the CFG onward.
  • llvm.gc.undeclare(alloca) - tells LLVM that the alloca should no longer be considered a GC root. If llvm.undeclare() is never called, then the alloca is treated as a root until the end of the function.
    One other thing I thought of was that it would be convenient to declare function parameters with llvm.gc.value(). However, I can get around not having that as a feature.

LLVM has a lifetime.end intrinsic which is similar:
http://llvm.org/docs/LangRef.html#int_lifetime_end

Reid

Hi Talin,

Thinking about it even more, here’s a short summary of what I would propose:

  • llvm.gc.value(value, metadata) - marks an SSA value as a garbage collection root. This remains in effect for the lifetime of the SSA value.
  • llvm.gc.declare(alloca, metadata) - marks an alloca as a garbage collection root. This intrinsic tells LLVM that it should start treating the alloca as a GC root from that point in the CFG onward.
  • llvm.gc.undeclare(alloca) - tells LLVM that the alloca should no longer be considered a GC root. If llvm.undeclare() is never called, then the alloca is treated as a root until the end of the function.

I am unsure why you need to provide a live range for a gc root. It looks to me that LLVM should be able to compute it (considering GC allocas never escape). I think the reason why you want this intrinsic is to work around the current way a GC root is declared (an alloca in the first block), but it does not have to be that way. At some point, Chris suggested that we could put the GC root in a different address space than other pointers.

Cheers,
Nicolas

Hi Talin,

Thinking about it even more, here’s a short summary of what I would propose:

  • llvm.gc.value(value, metadata) - marks an SSA value as a garbage collection root. This remains in effect for the lifetime of the SSA value.
  • llvm.gc.declare(alloca, metadata) - marks an alloca as a garbage collection root. This intrinsic tells LLVM that it should start treating the alloca as a GC root from that point in the CFG onward.
  • llvm.gc.undeclare(alloca) - tells LLVM that the alloca should no longer be considered a GC root. If llvm.undeclare() is never called, then the alloca is treated as a root until the end of the function.

I am unsure why you need to provide a live range for a gc root. It looks to me that LLVM should be able to compute it (considering GC allocas never escape). I think the reason why you want this intrinsic is to work around the current way a GC root is declared (an alloca in the first block), but it does not have to be that way. At some point, Chris suggested that we could put the GC root in a different address space than other pointers.

In the current scheme, the way you tell LLVM that a root is no longer needed is by assigning NULL to it. However, that assumes that all roots are pointers, which is not true in my world - a root can be a struct containing pointers inside of it. (In my current frontend, a non-pointer root is indicated by passing a non-NULL metadata argument to llvm.gcroot, which contains information about which fields in the struct are roots. This is especially important in the case of tagged unions, where the garbage collector may have to examine the union tag field in order to determine if the pointer field is indeed a pointer - passing the pointer alone would be insufficient to determine this.)

Putting GC roots in a different address space works OK for me, as long as I can have SSA values that are structs that have pointers embedded in them that are in this different address space. In other words, if I have an SSA value that is a struct containing pointers which are roots, I need for the garbage collector to see the entire struct, not just the pointers.

What I’m primarily asking for is to have the LLVM code generator automatically spill roots from SSA values to memory during a sync point, and reload them afterward, instead of my frontend having to generate code to do this. As I mentioned, the current scheme results in the frontend having to generate very inefficient IR because of the need to be conservative about root liveness. The frontend can’t know anything about the optimization passes that LLVM will perform on the function.

Hi Talin,

In the current scheme, the way you tell LLVM that a root is no longer needed is by assigning NULL to it. However, that assumes that all roots are pointers, which is not true in my world - a root can be a struct containing pointers inside of it. (In my current frontend, a non-pointer root is indicated by passing a non-NULL metadata argument to llvm.gcroot, which contains information about which fields in the struct are roots. This is especially important in the case of tagged unions, where the garbage collector may have to examine the union tag field in order to determine if the pointer field is indeed a pointer - passing the pointer alone would be insufficient to determine this.)

For a tagged union, I guess you are currently using the second argument of llvm.gcroot to provde the information? I guess keeping an intrinsic for this kind of code is the best way to go.

Putting GC roots in a different address space works OK for me, as long as I can have SSA values that are structs that have pointers embedded in them that are in this different address space. In other words, if I have an SSA value that is a struct containing pointers which are roots, I need for the garbage collector to see the entire struct, not just the pointers.

That’s entirely fine with a different address space. The roots given by the LLVM GC pass should contain the location of these embedded pointers.

What I’m primarily asking for is to have the LLVM code generator automatically spill roots from SSA values to memory during a sync point, and reload them afterward,

I don’t think that’s even needed: long term, LLVM should return the location of all roots for a given sync point (typically method call). By all roots, I mean register roots and stack roots. The frontend should then be responsible for updating those roots.

instead of my frontend having to generate code to do this. As I mentioned, the current scheme results in the frontend having to generate very inefficient IR because of the need to be conservative about root liveness.

Agree.

The frontend can’t know anything about the optimization passes that LLVM will perform on the function.

Sure. And I think the way to go is to remove the llvm.gcroot intrinsic (and the hackish way it currently works: right now, because we take the address of the alloca, the LLVM optimiziers won’t try to optimize an alloca that may escape through the llvm.gcroot function call). By having an address space for GC roots, optimizers don’t need to care about anything. After the optimizers and the register allocator, a final LLVM pass should compute the root lists of all sync points.

Nicolas

So I’ve been thinking about your proposal, that of using a special address space to indicate garbage collection roots instead of intrinsics. I want to point out some of the downsides of this approach:

  1. The biggest drawback that I see is that it still requires frontends to signal that a root is no longer being used by assigning NULL to the pointer. This turns out to be hard to do in some cases, for example:

while (true) {
String s = someFunc();
if (s->equals(“foo”)) {
break;
} else {
// call some function with s
}
}

In the above example, where would you put the code to zero out the root ‘s’? In this case, the answer is, after the body of the loop. Now, normally one would zero out roots at the end of the block in which the variable is declared, but in the case of this while loop, it’s wasted effort to do that since the root is just going to get assigned again at the top of the loop. However, sometimes we never make it to the end of the block, in this case due to a break statement, and so we have to either null out ‘s’ either just before or just after the break.

This example is relatively simple - if we start getting into scenarios involving switch statements, try/catch, and so on, I can easily construct complex examples that would make your head spin.

Worse, there are cases where there are multiple code paths exiting from a block, and you end up having to generate the code to null out a given root on each of those paths. Remember what I said about frontends having to generate inefficient code to handle llvm.gcroot? This is one of the reasons.

To address this, we need a better way of telling LLVM that a given variable is no longer a root. Of course, in the end it doesn’t matter what the lifetime of the variable is, the only thing that matters is the state of the variable at each safe point. If there was a way to tell LLVM ‘stop tracking this variable as a root’, and then let it worry about whether the value in the variable is live or not, the generated code could be much more efficient.

Of course, frontends would still have to deal with multiple exits from a block, but they can afford, I think, to be somewhat more lax about it. For example, assume in the example above that we insert a call to llvm-gcroot-end (or whatever we want to call it) after the while loop. The compiler knows in this case that all paths originating from the definition of s must pass through that point. Further, LLVM knows that there’s only one safe point in the while loop, which is in the ‘else’ block. Thus the only time the GCStrategy ever ‘sees’ the variable ‘s’ is at that one point, which means that there’s no need to zero it out. In other words, LLVM knows that the range over which the variable is live is smaller than the range over which it is declared a root.

  1. As I mentioned, my language supports tagged unions and other “value” types. Another example is a tuple type, such as (String, String). Such types are never allocated on the heap by themselves, because they don’t have the object header structure that holds the type information needed by the garbage collector. Instead, these values can live in SSA variables, or in allocas, or they can be embedded inside larger types which do live on the heap.

The way I currently handle such objects is by passing the trace table for the type as the metadata argument to llvm.gcroot(). A NULL metadata argument means that the root is a simple pointer, a non-NULL argument means that the root is a struct.

How do we signal that a struct is no longer a root? Currently I do so by zeroing out the entire structure, but again that’s wasteful. It would be better to simply tell LLVM that the struct is no longer a root.

  1. I’ve been following the discussions on llvm-dev about the use of the address-space property of pointers to signal different kinds of memory pools for things like shared address spaces. If we try to use that same variable to indicate garbage collection, now we have to multiplex both meanings onto the same field. We can’t just dedicate one special ID for the garbage collected heap, because there could be multiple such heaps. As you add additional orthogonal meanings to the address-space field, you end up with a combinatorial explosion of possible values for it.

Hi Talin,

So I’ve been thinking about your proposal, that of using a special address space to indicate garbage collection roots instead of intrinsics.

Great!

To address this, we need a better way of telling LLVM that a given variable is no longer a root.

Live variable analysis is already in LLVM and for me that’s enough to know whether a given variable is no longer a root. Note that each safe point has its own set of root locations, and these locations all contain live variables. Dead variables may still be in register or stack, but the GC will not visit them.

  1. As I mentioned, my language supports tagged unions and other “value” types. Another example is a tuple type, such as (String, String). Such types are never allocated on the heap by themselves, because they don’t have the object header structure that holds the type information needed by the garbage collector. Instead, these values can live in SSA variables, or in allocas, or they can be embedded inside larger types which do live on the heap.

If you know, at compile-time, whether you are dealing with a struct or a heap, what prevents you from emitting code that won’t need such tagged unions in the IR. Same for structs: if they contain pointers to heap objects, those will be in that special address space.

  1. I’ve been following the discussions on llvm-dev about the use of the address-space property of pointers to signal different kinds of memory pools for things like shared address spaces. If we try to use that same variable to indicate garbage collection, now we have to multiplex both meanings onto the same field. We can’t just dedicate one special ID for the garbage collected heap, because there could be multiple such heaps. As you add additional orthogonal meanings to the address-space field, you end up with a combinatorial explosion of possible values for it.

I think there exist already some convention between an ID and some codegen. Having one additional seems fine to me, even if you need to play with bits in case you need different IDs for a single pointer.

I’m also fine with the intrinsic way of declaring a GC root. But I think it is cumbersome, and error-prone in the presence of optimizers that may try to move away that intrinsic (I remember similar issues with the current EH intrinsics).

Nicolas

Hi Talin,

So I’ve been thinking about your proposal, that of using a special address space to indicate garbage collection roots instead of intrinsics.

Great!

To address this, we need a better way of telling LLVM that a given variable is no longer a root.

Live variable analysis is already in LLVM and for me that’s enough to know whether a given variable is no longer a root. Note that each safe point has its own set of root locations, and these locations all contain live variables. Dead variables may still be in register or stack, but the GC will not visit them.

  1. As I mentioned, my language supports tagged unions and other “value” types. Another example is a tuple type, such as (String, String). Such types are never allocated on the heap by themselves, because they don’t have the object header structure that holds the type information needed by the garbage collector. Instead, these values can live in SSA variables, or in allocas, or they can be embedded inside larger types which do live on the heap.

If you know, at compile-time, whether you are dealing with a struct or a heap, what prevents you from emitting code that won’t need such tagged unions in the IR. Same for structs: if they contain pointers to heap objects, those will be in that special address space.

I’m not sure what you mean by this.

Take for example a union of a String (which is a pointer) and a float. The union is either { i1; String * } or { i1; float }. The garbage collector needs to see that i1 in order to know whether the second field of the struct is a pointer - if it attempted to dereference the pointer when the field actually contains a float, the program would crash. The metadata argument that I pass to llvm.gcroot informs the garbage collector about the structure of the union.

Hi Talin,

So I’ve been thinking about your proposal, that of using a special address space to indicate garbage collection roots instead of intrinsics.

Great!

To address this, we need a better way of telling LLVM that a given variable is no longer a root.

Live variable analysis is already in LLVM and for me that’s enough to know whether a given variable is no longer a root. Note that each safe point has its own set of root locations, and these locations all contain live variables. Dead variables may still be in register or stack, but the GC will not visit them.

  1. As I mentioned, my language supports tagged unions and other “value” types. Another example is a tuple type, such as (String, String). Such types are never allocated on the heap by themselves, because they don’t have the object header structure that holds the type information needed by the garbage collector. Instead, these values can live in SSA variables, or in allocas, or they can be embedded inside larger types which do live on the heap.

If you know, at compile-time, whether you are dealing with a struct or a heap, what prevents you from emitting code that won’t need such tagged unions in the IR. Same for structs: if they contain pointers to heap objects, those will be in that special address space.

I’m not sure what you mean by this.

Take for example a union of a String (which is a pointer) and a float. The union is either { i1; String * } or { i1; float }. The garbage collector needs to see that i1 in order to know whether the second field of the struct is a pointer - if it attempted to dereference the pointer when the field actually contains a float, the program would crash. The metadata argument that I pass to llvm.gcroot informs the garbage collector about the structure of the union.

Sorry, I left a part out. The way that my garbage collector works currently is that the collector gets a pointer to the enture union struct, not just the pointer field within the union. In other words, the entire union struct is considered a “root”.

In fact, there might not even be a pointer in the struct. You see, because LLVM doesn’t directly support unions, I have to simulate that support by casting pointers. That is, for each different type contained in the union, I have a different struct type, and when I want to extract data from the union I cast the pointer to the appropriate type and then use GEP to get the data out. However, when allocating storage for the union, I have to use the largest data type, which might not be a pointer.

For example, suppose I have a type “String or (float, float, float)” - that is, a union of a string and a 3-tuple of floats. Most of the time what LLVM will see is { i1; { float; float; float; } } because that’s bigger than { i1; String* }. LLVM won’t even know there’s a pointer in there, except during those brief times when I’m accessing the pointer field. So tagging the pointer in a different address space won’t help at all here.

Hi Talin,

Sorry to interject -

For example, suppose I have a type “String or (float, float, float)” - that is, a union of a string and a 3-tuple of floats. Most of the time what LLVM will see is { i1; { float; float; float; } } because that’s bigger than { i1; String* }. LLVM won’t even know there’s a pointer in there, except during those brief times when I’m accessing the pointer field. So tagging the pointer in a different address space won’t help at all here.

I think this is a fairly uncommon use case that will be tricky to deal with no matter what method is used to track GC roots. That said, why not do something like make the pointer representation (the {i1, String*}) the long-term storage format, and only bitcast just before loading the floats? You could even use another address space to indicate that something is sometimes a pointer, dependent upon some other value (the i1, perhaps indicated with metadata).

My vote (not that it really counts for much) would be the address-space method. It seems much more elegant.

The only thing that I think would be unusually difficult for the address-space method to handle would be alternative pointer representations, such as those used in the latest version of Hotspot (see http://wikis.sun.com/display/HotSpotInternals/CompressedOops). Essentially, a 64-bit pointer is packed into 32-bits by assuming 8-byte alignment and restricting the heap size to 32GB. I’ve seen similar object-reference bitfields used in game engines. In this case, there is no “pointer” to attach the address space to.

(Yes, I know that Hotspot currently uses CompressedOops ONLY in the heap, decompressing them when stored in locals, but it is not inconceivable to avoid decompressing them if the code is just moving them around, as an optimization.)

Just my few thoughts.

-Joshua

Hi Talin,

Sorry to interject -

For example, suppose I have a type “String or (float, float, float)” - that is, a union of a string and a 3-tuple of floats. Most of the time what LLVM will see is { i1; { float; float; float; } } because that’s bigger than { i1; String* }. LLVM won’t even know there’s a pointer in there, except during those brief times when I’m accessing the pointer field. So tagging the pointer in a different address space won’t help at all here.

I think this is a fairly uncommon use case that will be tricky to deal with no matter what method is used to track GC roots. That said, why not do something like make the pointer representation (the {i1, String*}) the long-term storage format, and only bitcast just before loading the floats? You could even use another address space to indicate that something is sometimes a pointer, dependent upon some other value (the i1, perhaps indicated with metadata).

I don’t know if it’s an uncommon use case or not, but it is something that I handle already in my frontend. (I suppose it’s uncommon in the sense that almost no one uses the garbage collection features of LLVM, but part of the goal of this discussion is to change that.)

The problem with making { i1, String* } the long-term storage format is that it isn’t large enough in the example I gave, so you’ll overwrite other fields if you try to store the three floats. The more general issue is that the concepts we’re talking about simply aren’t expressible in IR as it exists today.

My vote (not that it really counts for much) would be the address-space method. It seems much more elegant.

I agree that the current solution isn’t the best. The problem I have is that the solutions that are being suggested are going to break my code badly, and with no way to fix it.

The real solution is to make root-ness a function of type. In other words, you can mark any type as being a root, which exposes the base address of all objects of that type to the garbage collector. This is essentially the same as the pointer-address-space suggestion, except that it’s not limited to pointers. (In practice, it would only ever apply to pointers and structs.)

(Heck, I’d even be willing to go with a solution where only structs and not pointers could be roots - it means I’d have to wrap every pointer in a struct, which would be a royal pain, but it would at least work.)

Hi Talin,

Sorry to interject -

For example, suppose I have a type “String or (float, float, float)” - that is, a union of a string and a 3-tuple of floats. Most of the time what LLVM will see is { i1; { float; float; float; } } because that’s bigger than { i1; String* }. LLVM won’t even know there’s a pointer in there, except during those brief times when I’m accessing the pointer field. So tagging the pointer in a different address space won’t help at all here.

I think this is a fairly uncommon use case that will be tricky to deal with no matter what method is used to track GC roots. That said, why not do something like make the pointer representation (the {i1, String*}) the long-term storage format, and only bitcast just before loading the floats? You could even use another address space to indicate that something is sometimes a pointer, dependent upon some other value (the i1, perhaps indicated with metadata).

I don’t know if it’s an uncommon use case or not, but it is something that I handle already in my frontend. (I suppose it’s uncommon in the sense that almost no one uses the garbage collection features of LLVM, but part of the goal of this discussion is to change that.)

I actually meant uncommon in the sense of having stack-allocated unions that participate in garbage collection. Off the top of my head, I could only name one language (ML) that might use a feature like that. Even then, I suspect most ML implementations would actually push that stuff onto the heap.

The problem with making { i1, String* } the long-term storage format is that it isn’t large enough in the example I gave, so you’ll overwrite other fields if you try to store the three floats. The more general issue is that the concepts we’re talking about simply aren’t expressible in IR as it exists today.

Good catch - what I actually intended to indicate was the String “half” of the union, properly padded - so something more like {i1, String*, float} (for 64-bit pointers).

My vote (not that it really counts for much) would be the address-space method. It seems much more elegant.

I agree that the current solution isn’t the best. The problem I have is that the solutions that are being suggested are going to break my code badly, and with no way to fix it.

The real solution is to make root-ness a function of type. In other words, you can mark any type as being a root, which exposes the base address of all objects of that type to the garbage collector. This is essentially the same as the pointer-address-space suggestion, except that it’s not limited to pointers. (In practice, it would only ever apply to pointers and structs.)

(Heck, I’d even be willing to go with a solution where only structs and not pointers could be roots - it means I’d have to wrap every pointer in a struct, which would be a royal pain, but it would at least work.)

Hmm… do you mean something like a “marked” bit (or maybe a vector of mark_ids) in every type, where you could query a function for values of “marked” types at particular safe points? This sounds like something that might solve the problem described below with compressed pointers (not that I am actually encountering this problem) - but in the near-term, it seems to me that everything that you could conceivably mark as a GC root would somehow contain a pointer value. In this case, union support in LLVM would make the generated IR cleaner, but not necessarily any more correct.

Being able to make a “marked” version of every type seems unnecessary, and in some cases, somewhat non-intuitive. Take for instance, making a “marked” float type - which I can’t think of any good use for. I like the idea of using address spaces because it keeps the concepts in IR largely orthogonal, rather than having features that overlap in purpose in many cases. That, and IMO it just makes sense for pointers into the (or, in general, a) heap be considered in a different address space from “normal” pointers. This could extend well to tracking pointers onto the stack (as seen in C# out/ref) for the purpose of generating closures (in .NET - which doesn’t currently have this feature).

The only thing that I think would be unusually difficult for the address-space method to handle would be alternative pointer representations, such as those used in the latest version of Hotspot (see http://wikis.sun.com/display/HotSpotInternals/CompressedOops). Essentially, a 64-bit pointer is packed into 32-bits by assuming 8-byte alignment and restricting the heap size to 32GB. I’ve seen similar object-reference bitfields used in game engines. In this case, there is no “pointer” to attach the address space to.

(Yes, I know that Hotspot currently uses CompressedOops ONLY in the heap, decompressing them when stored in locals, but it is not inconceivable to avoid decompressing them if the code is just moving them around, as an optimization.)

Just my few thoughts.

-Joshua


– Talin

-Joshua

I actually meant uncommon in the sense of having stack-allocated
unions that participate in garbage collection. Off the top of my
head, I could only name one language (ML) that might use a feature
like that. Even then, I suspect most ML implementations would
actually push that stuff onto the heap.

Common Lisp has (declare (dynamic-extent ..)).

But IMHO this is not a language-dependent issue. Rather, whenever any
language front-end using LLVM recognizes some (union) object can not
outlive the call, it would be a significant optimization if LLVM would
support stack-allocating the object.

        The *real* solution is to make root-ness a function of type.
        In other words, you can mark any type as being a root, which
        exposes the base address of all objects of that type to the
        garbage collector. This is essentially the same as the
        pointer-address-space suggestion, except that it's not limited
        to pointers. (In practice, it would only ever apply to
        pointers and structs.)

Yes, that would be the most intuitive solution.

However, note that there may be several garbage collected heaps using
different garbage collectors. Therefore the indicator for "rootness" is
not merely binary.

Being able to make a "marked" version of every type seems unnecessary,
and in some cases, somewhat non-intuitive. Take for instance, making
a "marked" float type - which I can't think of any good use for.

Such cases may sound exotic, but perhaps not non-existing. For example,
assume one wants to write a heap that supports generating statistics of
all live values, say, for the benefit of testing for memory leaks in
long-running servers. Or assume taking a snapshot of the computation
onto disk and recovering it in another machine with a different
representation of the (non-pointer) data type. Or checking (by checksum
exchange) whether the computational states match in a set of mutually
replicating computers running in lockstep. (I've actually done all of
these, although without the involvement of LLVM.)

Hi Talin,

Sorry to interject -

For example, suppose I have a type “String or (float, float, float)” - that is, a union of a string and a 3-tuple of floats. Most of the time what LLVM will see is { i1; { float; float; float; } } because that’s bigger than { i1; String* }. LLVM won’t even know there’s a pointer in there, except during those brief times when I’m accessing the pointer field. So tagging the pointer in a different address space won’t help at all here.

I think this is a fairly uncommon use case that will be tricky to deal with no matter what method is used to track GC roots. That said, why not do something like make the pointer representation (the {i1, String*}) the long-term storage format, and only bitcast just before loading the floats? You could even use another address space to indicate that something is sometimes a pointer, dependent upon some other value (the i1, perhaps indicated with metadata).

I don’t know if it’s an uncommon use case or not, but it is something that I handle already in my frontend. (I suppose it’s uncommon in the sense that almost no one uses the garbage collection features of LLVM, but part of the goal of this discussion is to change that.)

I actually meant uncommon in the sense of having stack-allocated unions that participate in garbage collection. Off the top of my head, I could only name one language (ML) that might use a feature like that. Even then, I suspect most ML implementations would actually push that stuff onto the heap.

The problem with making { i1, String* } the long-term storage format is that it isn’t large enough in the example I gave, so you’ll overwrite other fields if you try to store the three floats. The more general issue is that the concepts we’re talking about simply aren’t expressible in IR as it exists today.

Good catch - what I actually intended to indicate was the String “half” of the union, properly padded - so something more like {i1, String*, float} (for 64-bit pointers).

My vote (not that it really counts for much) would be the address-space method. It seems much more elegant.

I agree that the current solution isn’t the best. The problem I have is that the solutions that are being suggested are going to break my code badly, and with no way to fix it.

The real solution is to make root-ness a function of type. In other words, you can mark any type as being a root, which exposes the base address of all objects of that type to the garbage collector. This is essentially the same as the pointer-address-space suggestion, except that it’s not limited to pointers. (In practice, it would only ever apply to pointers and structs.)

(Heck, I’d even be willing to go with a solution where only structs and not pointers could be roots - it means I’d have to wrap every pointer in a struct, which would be a royal pain, but it would at least work.)

Hmm… do you mean something like a “marked” bit (or maybe a vector of mark_ids) in every type, where you could query a function for values of “marked” types at particular safe points? This sounds like something that might solve the problem described below with compressed pointers (not that I am actually encountering this problem) - but in the near-term, it seems to me that everything that you could conceivably mark as a GC root would somehow contain a pointer value. In this case, union support in LLVM would make the generated IR cleaner, but not necessarily any more correct.

Being able to make a “marked” version of every type seems unnecessary, and in some cases, somewhat non-intuitive. Take for instance, making a “marked” float type - which I can’t think of any good use for. I like the idea of using address spaces because it keeps the concepts in IR largely orthogonal, rather than having features that overlap in purpose in many cases. That, and IMO it just makes sense for pointers into the (or, in general, a) heap be considered in a different address space from “normal” pointers. This could extend well to tracking pointers onto the stack (as seen in C# out/ref) for the purpose of generating closures (in .NET - which doesn’t currently have this feature).

Let me ask a question before we go too much further. Currently the argument to llvm.gcroot must be an alloca instruction. You cannot GEP an internal field within the alloca and pass it to the gcroot intrinsic. So the entire alloca is considered a root, even if it has non-pointer fields. My question is, in this new address-space proposal, are we talking about changing this so that the garbage collector only “sees” the internal pointer fields within the alloca, or will it still be able to “see” the entire alloca? This is the crucial point for me - I’ve written my GC strategy to deal with complex allocas, and there are several data types - such as unions - which depend on this.

I can probably work around the union issue using methods like you suggest - that is building some “dummy” type containing a pointer as the long-term storage format - as long as the GC can still see the entire struct. It’s ugly because it means that my frontend has to know about padding and alignment and such, issues which I prefer to leave to LLVM to figure out.

But if we change it so that the GC only sees pointers, then I’m dead in the water.

As far as my suggestion of marking types go, you are right, it doesn’t make sense for most types. It really only matters for structs and pointers. Imagine if structs had an “isRoot” flag that lived next to “isPacked”, which makes the struct a distinct type. This would be written in IR as “gcroot { i1, float }” or something like that. The presence of this flag has the same effect as marking a pointer in the GC address space. Combine that with the ability to mark SSA values as roots, and my life would get vastly simpler and my generated code would get about 20% faster :slight_smile:

Hi Talin,

Let me ask a question before we go too much further. Currently the argument to llvm.gcroot must be an alloca instruction. You cannot GEP an internal field within the alloca and pass it to the gcroot intrinsic. So the entire alloca is considered a root, even if it has non-pointer fields. My question is, in this new address-space proposal, are we talking about changing this so that the garbage collector only “sees” the internal pointer fields within the alloca, or will it still be able to “see” the entire alloca? This is the crucial point for me - I’ve written my GC strategy to deal with complex allocas, and there are several data types - such as unions - which depend on this.

I can probably work around the union issue using methods like you suggest - that is building some “dummy” type containing a pointer as the long-term storage format - as long as the GC can still see the entire struct. It’s ugly because it means that my frontend has to know about padding and alignment and such, issues which I prefer to leave to LLVM to figure out.

Correct me if I am wrong, but to use unions without IR support means you already have to worry about padding.

But if we change it so that the GC only sees pointers, then I’m dead in the water.

In the end, the GC should only be seeing pointers anyway - some of whose “pointer-ness” depends on other values (as in the tagged union). I think your method could still work with the GC only seeing pointers (albeit with a little modification) - the only requirement I see that your method imposes on the design of a address-space based GC strategy is to maintain information about the structure (union) containing the pointer, next to the pointer. For this, metadata should work fine. While it is not particularly elegant, I don’t see why you would be “dead in the water” - because it could be made to work.

As far as my suggestion of marking types go, you are right, it doesn’t make sense for most types. It really only matters for structs and pointers. Imagine if structs had an “isRoot” flag that lived next to “isPacked”, which makes the struct a distinct type. This would be written in IR as “gcroot { i1, float }” or something like that. The presence of this flag has the same effect as marking a pointer in the GC address space. Combine that with the ability to mark SSA values as roots, and my life would get vastly simpler and my generated code would get about 20% faster :slight_smile:

I’m not saying this approach wouldn’t work or that it is in any way worse than the address-space method, but I think it would require many more changes to how LLVM handles types. One problem with how you are envisioning it (though not with the idea itself) is that it will probably be beneficial to be able to track multiple, independent types of roots - for example, roots for a long-term heap (where Method, Class, etc. might live) and the normal heap. The address-space method would handle this, but the isRoot() method would have to be extended to handle distinct roots - more like isRoot(int rootId) - which would really complicate the type system.

-Joshua

I actually meant uncommon in the sense of having stack-allocated
unions that participate in garbage collection. Off the top of my
head, I could only name one language (ML) that might use a feature
like that. Even then, I suspect most ML implementations would
actually push that stuff onto the heap.

Common Lisp has (declare (dynamic-extent …)).

But IMHO this is not a language-dependent issue. Rather, whenever any
language front-end using LLVM recognizes some (union) object can not
outlive the call, it would be a significant optimization if LLVM would
support stack-allocating the object.

Point taken - but I don’t think there is anything in the address-space method that would inherently prevent this.

However, note that there may be several garbage collected heaps using
different garbage collectors. Therefore the indicator for “rootness” is
not merely binary.

Exactly.

Being able to make a “marked” version of every type seems unnecessary,
and in some cases, somewhat non-intuitive. Take for instance, making
a “marked” float type - which I can’t think of any good use for.

Such cases may sound exotic, but perhaps not non-existing. For example,
assume one wants to write a heap that supports generating statistics of
all live values, say, for the benefit of testing for memory leaks in
long-running servers. Or assume taking a snapshot of the computation
onto disk and recovering it in another machine with a different
representation of the (non-pointer) data type. Or checking (by checksum
exchange) whether the computational states match in a set of mutually
replicating computers running in lockstep. (I’ve actually done all of
these, although without the involvement of LLVM.)

Sounds reasonable, but does LLVM really need to support all of these cases with one big overhaul?

-Joshua