(This is a summary of the big long thread on llvm.gcroot, for those who didn’t have time to read it.)
I’m proposing the replacement of llvm.gcroot() with three new intrinsics:
- llvm.gc.declare(alloca, meta). This intrinsic marks an alloca as a garbage collection root. It can occur anywhere within a function, and lasts either until the end of the function, or a until matching call to llvm.gc.undeclare().
- llvm.gc.undeclare(alloca). This intrinsic unmarks and alloca, so that it is no longer considered a root from that point onward.
- llvm.gc.value(value, meta). This intrinsic marks an SSA value as a root. The SSA value can be any type, not necessarily a pointer. This marking lasts for the lifetime of the SSA value.
The names of the intrinsics are intended to follow the naming convention for declaring debug variables (llvm.dbg.declare and llvm.dbg.value).
The llvm.gc.declare() and llvm.gc.value() intrinsics do essentially the same thing: At each safe point, they make the first argument available to the GC strategy as a pointer, using whatever means is most efficient from a code generation standpoint. In the case of llvm.gc.declare(), which takes an alloca as it’s first argument, this is the same as llvm.gcroot() does now, and is fairly straightforward: The GC strategy gets a reference to the value argument.
In the case of llvm.gc.value(), providing a pointer to the GC strategy is more involved, since the value may be in a register or split across several registers. In some cases, it may be required to spill the value into memory during safe points, and re-load it afterwards. In many cases, calling a function will require saving the SSA value on the stack regardless, so it may be possible to determine a pointer to that stack location.
The llvm.undeclare() intrinsic is used to indicate the end of the lifetime of an alloca root. This replaces the current convention of assigning NULL to a root to indicate the end of it’s lifetime. This has two advantages: First, it avoids the extra store, and second, it allows the backend code generator to re-use the same stack slots for different roots, as long as their lifetimes don’t overlap. (Under the current scheme, the lifetime of a root is required to be the whole function body.)
In all cases, LLVM should not make any assumptions about the type of the value argument with respect to garbage collection, and should treat it as a black box to the extent possible. The value may or may not contain pointers, and it may or may not contain non-pointer fields. It will be up the the GC strategy to take the appropriate action based on the data type and the meta argument.
One open issue is whether formal function arguments - which are normally treated as SSA values - can be passed as arguments to llvm.gc.value(). From the standpoint of a user, this would be very convenient to have, but if it’s too difficult, then it can be worked around by copying the function parameters to local SSA values.
Now, I realize that there were several strong supporters of a competing proposal involving using the address-space field of pointers in LLVM. I won’t go into the details here, except to say two things (1) I believe that approach limits the generality of LLVM’s support for diverse collectors, and (2) in the original thread, the folks who supported my proposal tended to be people who were actual users of the current system, or who planned on using it in the near future.