Improving Garbage Collection

Talin,
            do you identify safe-points in the current or proposed llvm scheme, and if so how,
or are they implicit as being at all call sites (which begs the question what about leaves
in the call tree, how does GC get started at all in that case).

Peter Lawrence.

Talin,
do you identify safe-points in the current or proposed llvm scheme, and if so how,
or are they implicit as being at all call sites (which begs the question what about leaves
in the call tree, how does GC get started at all in that case).

The LLVM linker has a feature where you can specify what kind of safe points your collector requires - the options are Loop, Return, PreCall and PostCall. You can also override this behavior and examine each instruction and return a boolean indicating whether it is or isn’t a safe point.

Currently I only have function calls as safe points, although I may eventually enable loops as well. As far as leaf functions go, consider that the call to allocate memory is also a safe point - and if a function doesn’t allocate any memory then we don’t care if the GC is involved or not.

One complication with the current scheme is that the frontend has to have a sense of where the safe points are going to be. Because the current scheme requires the frontend to insert additional loads and stores around safe points (for spilling register values to memory so they can be traced), the frontend has to be able to guess which function call might be a safe point - but it can’t know for sure due to the fact that optimization and inlining (which happens much later) may cause the removal of the actual call instruction. The safe but inefficient approach is to insert the extra loads and stores around every call instruction.

Talin,
how about having the front-end generate an llvm.safe.point() intrinsic call at
the desired safe points, and having the addresses of the GC roots (at that point,
can vary from call to call) be the parameters (with noescape attribute) to the intrinsic,

IIUC currently the GC roots are tagged, and all analysis and transform optimizations
have to special case these tagged objects, but if instead their addresses were taken at
the safe points no special casing would have to be done – all analysis and transform
optimizations already know how to deal with objects whose address is taken,

and since llvm does already have a “noescape” (not sure that’s the correct name?)
attribute for parameters, these addresses won’t be misinterpreted by any alias
analysis either, and llvm is free to go ahead and keep these values in registers
between safe points – you can stop asking how to allow GC roots as SSA values,
any traditional load-store optimization pass will do it for you for free.

*** without you having to insert explicit load and store instructions, and having to
somehow mark them as non-delete-able, or always omit optimization passes that
you would otherwise like to have enabled ***

and also without you having to store NULL into a safe point to end its lifetime.
and I would suggest eliminating the gcroot() intrinsic as it’s information content
would be redundant.

thoughts, comments ???

-Peter Lawrence.

That logic only applies to single-threaded apps (but it may still be good enough for you in practice).

-Andy

Talin,
how about having the front-end generate an llvm.safe.point() intrinsic call at
the desired safe points, and having the addresses of the GC roots (at that point,
can vary from call to call) be the parameters (with noescape attribute) to the intrinsic,

IIUC currently the GC roots are tagged, and all analysis and transform optimizations
have to special case these tagged objects, but if instead their addresses were taken at
the safe points no special casing would have to be done – all analysis and transform
optimizations already know how to deal with objects whose address is taken,

and since llvm does already have a “noescape” (not sure that’s the correct name?)
attribute for parameters, these addresses won’t be misinterpreted by any alias
analysis either, and llvm is free to go ahead and keep these values in registers
between safe points – you can stop asking how to allow GC roots as SSA values,
any traditional load-store optimization pass will do it for you for free.

*** without you having to insert explicit load and store instructions, and having to
somehow mark them as non-delete-able, or always omit optimization passes that
you would otherwise like to have enabled ***

and also without you having to store NULL into a safe point to end its lifetime.
and I would suggest eliminating the gcroot() intrinsic as it’s information content
would be redundant.

thoughts, comments ???

-Peter Lawrence.

It is helpful to think of the stack/register map generation in two
phases: before and after identifying the location of heap
pointers. Say PrePtrMap and PostPtrMap. We don’t need to know stack
offsets and register names at that point, but do we need a 1-1 mapping
from pointer values to identifiable physical locations. I deliberately
avoid calling these values roots here, because we may have multiple
live pointers derived from an object and multiple copies of the same
pointer. GC only needs to see one of these values to trace roots, but
each of these still needs its own entry in the stack/register map for
a moving collector.

PrePtrMap, we need a type system to precisely tag all values known to
contain valid heap pointers. No potentially uninitialized/undefined
values allowed here. IntPtr/PtrInt casts become control dependent.

PostPtrMap, we need an IR that represents mapped pointers as live-in
to safepoints, and safepoints need to be defined as clobbering all
locations that may contain a pointer. For example, if we have a
register map, then we can no longer move and add instruction across a
call if it may operate on a pointer. Obviously, it’s easier to avoid
invalidating a stack map, but fundamentally the same problem. No
amount of spilling can bail you out without updating the map.

The current gcroot solution works around the lack of LLVM type system
support for heap pointers by effectively mapping pointers very early
(in the front end), reloading roots after safepoints (so we only need
one map entry per root), and relying on the rules that allows callees
to write their caller’s stack under certain circumstances (someone
needs to explain these rules to me–is it only possible when an alloca
pointer is taken?).

Your proposal is exactly the same in terms of when the heap pointers
are identified. That leaves all of the LLVM optimizer and codegen
running in PostPtrMap mode. The problem is that LLVM is free to make
copies of pointers and optimize across call sites without knowing how to
update the map.

The most efficient way to support GC is to move identification of
pointer locations as late as possible. Optimization across safepoints
needs to be effectively disabled after that point.

-Andy

Talin,
how about having the front-end generate an llvm.safe.point() intrinsic call at
the desired safe points, and having the addresses of the GC roots (at that point,
can vary from call to call) be the parameters (with noescape attribute) to the intrinsic,

IIUC currently the GC roots are tagged, and all analysis and transform optimizations
have to special case these tagged objects, but if instead their addresses were taken at
the safe points no special casing would have to be done – all analysis and transform
optimizations already know how to deal with objects whose address is taken,

and since llvm does already have a “noescape” (not sure that’s the correct name?)
attribute for parameters, these addresses won’t be misinterpreted by any alias
analysis either, and llvm is free to go ahead and keep these values in registers
between safe points – you can stop asking how to allow GC roots as SSA values,
any traditional load-store optimization pass will do it for you for free.

*** without you having to insert explicit load and store instructions, and having to
somehow mark them as non-delete-able, or always omit optimization passes that
you would otherwise like to have enabled ***

and also without you having to store NULL into a safe point to end its lifetime.
and I would suggest eliminating the gcroot() intrinsic as it’s information content
would be redundant.

thoughts, comments ???

-Peter Lawrence.

It is helpful to think of the stack/register map generation in two
phases: before and after identifying the location of heap
pointers. Say PrePtrMap and PostPtrMap. We don’t need to know stack
offsets and register names at that point, but do we need a 1-1 mapping
from pointer values to identifiable physical locations. I deliberately
avoid calling these values roots here, because we may have multiple
live pointers derived from an object and multiple copies of the same
pointer. GC only needs to see one of these values to trace roots, but
each of these still needs its own entry in the stack/register map for
a moving collector.

PrePtrMap, we need a type system to precisely tag all values known to
contain valid heap pointers. No potentially uninitialized/undefined
values allowed here. IntPtr/PtrInt casts become control dependent.

PostPtrMap, we need an IR that represents mapped pointers as live-in
to safepoints, and safepoints need to be defined as clobbering all
locations that may contain a pointer. For example, if we have a
register map, then we can no longer move and add instruction across a
call if it may operate on a pointer. Obviously, it’s easier to avoid
invalidating a stack map, but fundamentally the same problem. No
amount of spilling can bail you out without updating the map.

The current gcroot solution works around the lack of LLVM type system
support for heap pointers by effectively mapping pointers very early
(in the front end), reloading roots after safepoints (so we only need
one map entry per root), and relying on the rules that allows callees
to write their caller’s stack under certain circumstances (someone
needs to explain these rules to me–is it only possible when an alloca
pointer is taken?).

Your proposal is exactly the same in terms of when the heap pointers
are identified. That leaves all of the LLVM optimizer and codegen
running in PostPtrMap mode. The problem is that LLVM is free to make
copies of pointers and optimize across call sites without knowing how to
update the map.

The most efficient way to support GC is to move identification of
pointer locations as late as possible. Optimization across safepoints
needs to be effectively disabled after that point.

-Andy

Achievement unlocked: People who are smarter than me and more knowledgeable about the LLVM backends are arguing over the details of how GC ought to work :slight_smile:

But *only* for a moving collector.
If you're using a non-moving gc the pointers won't change, so there's
no need to force reloading them after a safe point..
Unless weak pointers are supported, in which case you need a way to
mark *some* pointers as being clobbered (but not others).