Patch to allow llvm.gcroot to work with non-pointer allocas.

I’m moving this thread to llvm-dev in the hopes of reaching a wider audience.

This patch relaxes the restriction on llvm.gcroot so that it can work with non-pointer allocas. The only changes are to Verifier.cpp - it appears from my testing that llvm.gcroot always worked fine with non-pointer allocas, except that the verifier wouldn’t allow it. I’ve used this patch to build an efficient stack crawler (an alternative to shadow-stack that uses only static constant data structures.)

Here’s a deal: If you accept this patch, I’ll write up an extensive tutorial on how to write a stack crawler like mine. (Actually, it’s already written, however without this patch the tutorial doesn’t make any sense.)

gcroot.patch (1 KB)

…crickets…

Hi Talin,

I don't think anyone is really using the GC support, other than Nicolas in VMKit. If he's ok with the change, I am too. Please make sure the dox stay up to date though.

-Chris

Thanks for the heads up Chris.

Talin, how is your GC dealing with non-pointers (be it allocas or not)? What is the use-case (either in C or LLVM)?

Nicolas

Thanks for the heads up Chris.

Talin, how is your GC dealing with non-pointers (be it allocas or not)? What is the use-case (either in C or LLVM)?

Many languages support the notion of a “value type”. Value types are always passed by value, unlike reference types which are always passed by pointer. An example is the “struct” type in C#. Another example is a “tuple” type. A value type which is a local variable lives on the stack as an alloca, not on the heap. When a function is called with a value type as argument, the callee gets its own copy of the argument, rather than sharing a pointer with the caller.

Value types are represented in LLVM using structs, and may contain pointer fields which need to be traced.

The way that I handle non-pointer types is to generate an array of field offsets (containing the offset of each pointer field within the struct) as the metadata argument to llvm.gcroot. This meta argument is then processed in my GCStrategy, where I add the stack root offset to the offsets in the field offset array, which yields the stack offsets of the actual pointers in the call frame.

It’s all pretty simple really.

Hi Talin,

Many languages support the notion of a “value type”. Value types are always passed by value, unlike reference types which are always passed by pointer. An example is the “struct” type in C#. Another example is a “tuple” type. A value type which is a local variable lives on the stack as an alloca, not on the heap. When a function is called with a value type as argument, the callee gets its own copy of the argument, rather than sharing a pointer with the caller.

Yes.

Value types are represented in LLVM using structs, and may contain pointer fields which need to be traced.

Yes.

The way that I handle non-pointer types is to generate an array of field offsets (containing the offset of each pointer field within the struct) as the metadata argument to llvm.gcroot. This meta argument is then processed in my GCStrategy, where I add the stack root offset to the offsets in the field offset array, which yields the stack offsets of the actual pointers in the call frame.

Did you think of the alternative of calling llvm.gcroot on pointers in this struct? This requires to change the verifier to support non-alloca pointers in llvm.gcroot, but it makes the solution more general and cleaner: pointers given to llvm.gcroot only point to objects in the heap.

I think that, originally, the purpose of the second argument of llvm.gcroot was to emit static type information.

Nicolas

Hi Talin,

Many languages support the notion of a “value type”. Value types are always passed by value, unlike reference types which are always passed by pointer. An example is the “struct” type in C#. Another example is a “tuple” type. A value type which is a local variable lives on the stack as an alloca, not on the heap. When a function is called with a value type as argument, the callee gets its own copy of the argument, rather than sharing a pointer with the caller.

Yes.

Value types are represented in LLVM using structs, and may contain pointer fields which need to be traced.

Yes.

The way that I handle non-pointer types is to generate an array of field offsets (containing the offset of each pointer field within the struct) as the metadata argument to llvm.gcroot. This meta argument is then processed in my GCStrategy, where I add the stack root offset to the offsets in the field offset array, which yields the stack offsets of the actual pointers in the call frame.

Did you think of the alternative of calling llvm.gcroot on pointers in this struct? This requires to change the verifier to support non-alloca pointers in llvm.gcroot, but it makes the solution more general and cleaner: pointers given to llvm.gcroot only point to objects in the heap.

I think that, originally, the purpose of the second argument of llvm.gcroot was to emit static type information.

Let me give you a more complicated example to see why this won’t work:

Imagine I have a discriminated union type, whose type declaration looks like this:

var x:int or String.

The variable ‘x’ can be either an integer or a reference to a string object. In LLVM assembly, this data structure is represented by the following struct:

{ i1, String * }

The ‘i1’ field (the ‘disciminator’) is used to determine what kind of value is currently stored in the union. If it’s 0, then it’s an int, and the structure will be cast to { i8, int } before extracting the value. If it’s 1, then it’s a String pointer. The compiler does not allow access to the wrong type - if the value it 0, the language does not allow you to extract the value as a String.

Now, suppose we declare this as a local variable, so the union struct is contained within an alloca. We want to declare the String pointer as a root, but only if the discriminator is not 0. We can’t determine this at compile time, instead the collector has to be smart enough to examine the union and determine whether it contains a pointer or not.

In my compiler, what I do is to generate a callback function that can trace the object. This callback function is contained within a data structure that is passed as the metadata argument to llvm.gcroot.

So my code looks like this (bit casts omitted for simplicity):

%int_or_string = type { i8, String * }
%x = alloca %int_or_string
call void llvm.gcroot( i8 ** x, i8* @.tracetable.int_or_string)

Where ‘.tracetable.int_or_string’ is the static type information for the “int or string” type, containing both the field offsets and the callback function to test the value of the disciminator.

Note that if I only declared the pointer as a root, then this wouldn’t work - the collector needs access to the entire data structure in order to trace the object correctly.

Also, I think this is the right solution - llvm.gcroot is only responsible for the offset of the base of the alloca, not for any of it’s internal structure, which is the responsibility of the compiler and the GCStrategy.

I didn’t have unions in mind - indeed you need some kind of static information in such a case. The GC infrastructure in LLVM having so little love, I think it is good if you can improve it in any ways, as well as defining new interfaces.

Cheers,
Nicolas

I didn’t have unions in mind - indeed you need some kind of static information in such a case. The GC infrastructure in LLVM having so little love, I think it is good if you can improve it in any ways, as well as defining new interfaces.

So the patch is OK then? All it does is change the verifier – llvm.gcroot already has the ability to do this, its just that the verifier wouldn’t allow it.

Yes, it’s definitely OK. In the future, I think the verifier will also be changed to support non-allocas in llvm.gcroot.

Nicolas