Boehm GC + static variables?

Hi,

I’m running LLVM bitcode generated by my compiler under lli. The bitcode is linked against Boehm GC (lli -load=/usr/lib/libgc.so).

It looks like Boehm GC isn’t scanning global variables and as a result objects referenced only through globals are being prematurely collected. I understand that Boehm GC needs to see the data segment containing my global variables as a root. For native executables It’s smart enough to pick this up from the sections in the ELF executable but this doesn’t apply when LLVM bitcode is running under lli.

Is there some way I can hook into the code generator’s placement of new data segments as bitcode is compiled to native code so I can add roots for the segments as required by Boehm GC?

Thanks in advance,
– James Williams

I’ve implemented this by adding calls to GC_add_roots(,+1) to the llvm.global_ctors before any other static initialization code for the module.

This should be safe assuming that:

  • global variables are laid out in memory in the order they appear in their module (and ideally contiguously without being interleaved with any other values)
  • llvm.global_ctors for a given module is run before any other code can reference static variables belonging to that module.

Can anyone confirm if I can rely on these assumptions?

– James

You should look at
http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/ExecutionEngine/JITMemoryManager.h?view=markup
and see if inheriting from that and overriding allocateGlobal() will
do what you want.

I'm a little surprised the boehm gc doesn't already see the globals,
since there's a reference to their memory from the JMM, but maybe it
doesn't scan mmap regions by default.

I've implemented this by adding calls to GC_add_roots(<first global in
>,<last global in module>+1) to the llvm.global_ctors before any other
static initialization code for the module.

This should be safe assuming that:
- global variables are laid out in memory in the order they appear in their
module (and ideally contiguously without being interleaved with any other
values)

This isn't necessarily the case. The default JMM allocates slabs with
mmap, which doesn't guarantee that they're in order.

- llvm.global_ctors for a given module is run before any other code can
reference static variables belonging to that module.

llvm.global_ctors runs when you call
runStaticConstructorsDestructors(false), which lli does before running
any other user code. There could be other global constructors that
access the globals, though, before your constructor runs.

You should look at
http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/ExecutionEngine/JITMemoryManager.h?view=markup
and see if inheriting from that and overriding allocateGlobal() will
do what you want.

Thanks - I’ll take a look at it.

I’m a little surprised the boehm gc doesn’t already see the globals,
since there’s a reference to their memory from the JMM, but maybe it
doesn’t scan mmap regions by default.

I believe that Boehm GC will scan data sections in the main ELF executable and in any loaded shared objects and also scan machine registers and the stack down to (roughly) the stack top at program entry but it won’t scan anything else unless it’s explictly told to. If the JMM gets the memory that global variables are stored in via malloc or mmap then I’ll need to add it to the GC root set via GC_add_roots.

Having said that, since I know the exact set of global variables in my compiled program it may be more efficient for me to give the GC this exact set rather than the entire JMM area(s), provided I can ensure that the GC knows of the existence of each global before any reference to a garbage collected object is stored in in.

– James