Proposal for improving llvm.gcroot (summarized)

(This is a summary of the big long thread on llvm.gcroot, for those who didn’t have time to read it.)

I’m proposing the replacement of llvm.gcroot() with three new intrinsics:

  • llvm.gc.declare(alloca, meta). This intrinsic marks an alloca as a garbage collection root. It can occur anywhere within a function, and lasts either until the end of the function, or a until matching call to llvm.gc.undeclare().
  • llvm.gc.undeclare(alloca). This intrinsic unmarks and alloca, so that it is no longer considered a root from that point onward.
  • llvm.gc.value(value, meta). This intrinsic marks an SSA value as a root. The SSA value can be any type, not necessarily a pointer. This marking lasts for the lifetime of the SSA value.

The names of the intrinsics are intended to follow the naming convention for declaring debug variables (llvm.dbg.declare and llvm.dbg.value).

The llvm.gc.declare() and llvm.gc.value() intrinsics do essentially the same thing: At each safe point, they make the first argument available to the GC strategy as a pointer, using whatever means is most efficient from a code generation standpoint. In the case of llvm.gc.declare(), which takes an alloca as it’s first argument, this is the same as llvm.gcroot() does now, and is fairly straightforward: The GC strategy gets a reference to the value argument.

In the case of llvm.gc.value(), providing a pointer to the GC strategy is more involved, since the value may be in a register or split across several registers. In some cases, it may be required to spill the value into memory during safe points, and re-load it afterwards. In many cases, calling a function will require saving the SSA value on the stack regardless, so it may be possible to determine a pointer to that stack location.

The llvm.undeclare() intrinsic is used to indicate the end of the lifetime of an alloca root. This replaces the current convention of assigning NULL to a root to indicate the end of it’s lifetime. This has two advantages: First, it avoids the extra store, and second, it allows the backend code generator to re-use the same stack slots for different roots, as long as their lifetimes don’t overlap. (Under the current scheme, the lifetime of a root is required to be the whole function body.)

In all cases, LLVM should not make any assumptions about the type of the value argument with respect to garbage collection, and should treat it as a black box to the extent possible. The value may or may not contain pointers, and it may or may not contain non-pointer fields. It will be up the the GC strategy to take the appropriate action based on the data type and the meta argument.

One open issue is whether formal function arguments - which are normally treated as SSA values - can be passed as arguments to llvm.gc.value(). From the standpoint of a user, this would be very convenient to have, but if it’s too difficult, then it can be worked around by copying the function parameters to local SSA values.

Now, I realize that there were several strong supporters of a competing proposal involving using the address-space field of pointers in LLVM. I won’t go into the details here, except to say two things (1) I believe that approach limits the generality of LLVM’s support for diverse collectors, and (2) in the original thread, the folks who supported my proposal tended to be people who were actual users of the current system, or who planned on using it in the near future.

Hi Talin,

What changes to code generation would be necessary to support this?

Is there any intention of supporting a collector that has static stack
maps for each function, i.e. a table telling you, for each point in
the code, where all the roots are on the stack (similar to unwind info
for exception handling)? If so, I think it's a bit dodgy to use
intrinsic function calls to mark the start/end of the lifetime of a GC
root, because function calls are inherently dynamic. For example, you
can't represent this code with a static stack map:

if (cond) {
  llvm.gc.declare(foo, bar);
}
...
// foo may or may not be a root here
...
if (cond) { // same condition as above
  llvm.gc.undeclare(foo);
}

Even if you're careful not to generate code like this in your front
end, how can you guarantee that LLVM optimisation passes won't
introduce it?

The old llvm.dbg.region.start and llvm.dbg.region.end had the same
kind of problem.

Thanks,
Jay.

llvm.gc.declare(alloca, meta). This intrinsic marks an alloca as a garbage
collection root. It can occur anywhere within a function, and lasts either
until the end of the function, or a until matching call to
llvm.gc.undeclare().
llvm.gc.undeclare(alloca). This intrinsic unmarks and alloca, so that it is
no longer considered a root from that point onward.

Hi Talin,

What changes to code generation would be necessary to support this?

I can only describe this in abstract terms, since I know very little about LLVM’s code generation. (I am primarily a user of LLVM, not a developer of it, although I have made a few minor contributions.)

Is there any intention of supporting a collector that has static stack
maps for each function, i.e. a table telling you, for each point in
the code, where all the roots are on the stack (similar to unwind info
for exception handling)? If so, I think it’s a bit dodgy to use

That is already supported in the current LLVM. The changes I am proposing are merely an extension of what we have now, allowing frontends to emit more efficient code.

intrinsic function calls to mark the start/end of the lifetime of a GC
root, because function calls are inherently dynamic. For example, you
can’t represent this code with a static stack map:

if (cond) {
llvm.gc.declare(foo, bar);
}

// foo may or may not be a root here

if (cond) { // same condition as above
llvm.gc.undeclare(foo);
}

You would need to do the same as what is done today: Move the declare outside of the condition, and initialize the variable to a null state, such that the garbage collector will know to ignore the variable. In the if-block, you then overwrite the variable with actual data.

The difference is that in the today’s LLVM, the variable declaration has to be in the first block, and lasts for the entire function - so you have to initialize all of your stack variables to a null state in the first block. By extending the notation to allow stack roots to have a limited lifetime, we can avoid having to initialize the stack root unless we actually enter the block where it is defined.

I should mention that the declare/undeclare intrinsics are the least important part of this proposal. The important part is the ability to declare SSA values as roots - that is what will make a world of difference to folks like myself that are writing frontends that use garbage collection.

if (cond) {
llvm.gc.declare(foo, bar);
}
...
// foo may or may not be a root here
...
if (cond) { // same condition as above
llvm.gc.undeclare(foo);
}

You would need to do the same as what is done today: Move the declare
outside of the condition, and initialize the variable to a null state, such
that the garbage collector will know to ignore the variable. In the
if-block, you then overwrite the variable with actual data.

I think there's a real problem here. The code in LLVM that creates the
static stack map will want the llvm.gc.declare / llvm.gc.undeclare
calls to be "well-formed" in some sense: nicely paired and nested,
with each declare and undeclare being at the same depth inside any
loops or "if"s in the CFG. But consider:

1. Your front end generates well-formed llvm.gc.* calls.
2. The LLVM optimisers kick in and do jump threading, tail merging and whatnot.
3. The code that creates the static stack map finds a complete mess.
(And by this point I think it would be too late to do any
transformations like you suggest above: "Move the declare outside of
the condition ...")

I think it's a good idea to have information in the IR about the
lifetime of GC roots, but I think intrinsic calls are the wrong
representation for that information.

This is very similar to the problem of representing lexical scopes in
debug info. The llvm.dbg.region.* intrinsics were the wrong way of
doing it, because of the problems I mentioned above. Now we use
metadata attached to each instruction to say what scope it is in,
which is much better, because it is robust against optimisation
passes.

Jay.

Of course, using metadata isn't acceptable for gc because it can be
dropped, and adding something like it for gc wouldn't be acceptable to
people writing optimizations.

Reid

That’s a good point.

As far as the use of intrinsics go, nothing says that the internal representation of the gcroot information can’t be converted into some more durable form that is used by the LLVM code generator internally. I’m speaking somewhat from ignorance here, but I always assumed that the existing llvm.gcroot() calls got converted into some other form, rather than the backend passes attempting to preserve the actual intrinsic calls (but I could be wrong.)

I understand about wanting the data for garbage collection to be stored per-instruction, however there’s some problems with this approach. First, as Reid points out, metadata nodes may be dropped. We can’t add additional fields to the instructions themselves, because only a tiny minority of LLVM users make use of the garbage collection features, and the rest don’t want to have to pay the memory cost of adding an additional field to every instruction. And storing the information in an auxiliary data structure that is keyed by instruction is risky, because there are so many backend passes which create and destroy instructions, and it would be very difficult to get all of them to keep the garbage collection information up to date as they do their transformations.

Also, I’m not sure that instructions are the right place to be hanging this information - garbage collection is something that is normally associated with a value or a type, and not with an operation. It’s a little confusing because in LLVM IR, values and instructions are conflated.

A better scheme might be to associate garbage collection traits with a type. However, with the current structural type system, there’s no way to prevent LLVM from merging structurally identical types, which means that your garbage-collectable type might get merged with a non-collectible type. Chris has proposed a way to associate names with types, but it will be a while before that’s implemented.