Running code on module load

Hi,

Most of the remaining parts of Objective-C code generation require code to be run when the module loads (calling runtime library functions to register the existence of classes, categories, protocols, and so on).

Looking at llvm-gcc, it seems that I need to put these in @llvm.global_ctors to have them automatically put in the .ctors section in the compiled module. Does clang already have some infrastructure for collecting these and emitting the list?

David

On further inspection, there appears to be a bug in llvm-gcc when performing this operation.

The @llvm.global_ctors variable is defined as an array of i32, function pairs (the i32 seems to always be 65536 - does anyone know what this is?). However, the functions can have arbitrary types. The following code will break llvm-gcc:

int __attribute__((constructor)) foo(void) {
   return 0;
}
void __attribute__((constructor)) bar(void) {}

Worse, it doesn't give a nice warning, but an internal LLVM error because the first method defines the type of the array and the second being added (with a different type) breaks everything.

Before I implement this in clang, I'd like to know if anyone knows the correct behaviour. Presumably the ctor functions should only be of type void()*, since C doesn't provide any mechanism for handling returns or passing in arguments to functions called in this way. Should we be silently casting ctor functions to this, or throwing an error if they are of another form (could cause runtime errors if they actually try to use their parameters)?

Is there an accompanying linker bug? What happens if you define the @llvm.global_ctors as two different types in two different modules and then try linking them?

David

On further inspection, there appears to be a bug in llvm-gcc when
performing this operation.

The @llvm.global_ctors variable is defined as an array of i32,
function pairs (the i32 seems to always be 65536 - does anyone know
what this is?).

That number is an initializer priority, which can be set with attr(init_priority) on some targets. No LLVM targets do anything with it, so you can just set it to 65536.

However, the functions can have arbitrary types. The
following code will break llvm-gcc:

int __attribute__((constructor)) foo(void) {
  return 0;
}
void __attribute__((constructor)) bar(void) {}

Worse, it doesn't give a nice warning, but an internal LLVM error
because the first method defines the type of the array and the second
being added (with a different type) breaks everything.

Whoops, I fixed this:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20080303/059098.html

The LLVM IR generated for this test is now:

Before I implement this in clang, I'd like to know if anyone knows the
correct behaviour. Presumably the ctor functions should only be of
type void()*, since C doesn't provide any mechanism for handling
returns or passing in arguments to functions called in this way.

Right.

Should we be silently casting ctor functions to this, or throwing an
error if they are of another form (could cause runtime errors if they
actually try to use their parameters)?

Silently casting it is the right thing to do, and emitting a warning would also be great. I wouldn't emit an error though.

Is there an accompanying linker bug? What happens if you define the
@llvm.global_ctors as two different types in two different modules and
then try linking them?

The front-end is required to make an array of type ["n" x { i32, void ()* }], so the bug is in the front-end.

-Chris

On further inspection, there appears to be a bug in llvm-gcc when
performing this operation.

The @llvm.global_ctors variable is defined as an array of i32,
function pairs (the i32 seems to always be 65536 - does anyone know
what this is?).

That number is an initializer priority, which can be set with
attr(init_priority) on some targets. No LLVM targets do anything with
it, so you can just set it to 65536.

Okay, I've done that for now it's MagicNumber. We should probably reify that later...

Before I implement this in clang, I'd like to know if anyone knows the
correct behaviour. Presumably the ctor functions should only be of
type void()*, since C doesn't provide any mechanism for handling
returns or passing in arguments to functions called in this way.

Right.

Should we be silently casting ctor functions to this, or throwing an
error if they are of another form (could cause runtime errors if they
actually try to use their parameters)?

Silently casting it is the right thing to do, and emitting a warning
would also be great. I wouldn't emit an error though.

Not done that yet, because it should probably be handled when building the AST.

Is there an accompanying linker bug? What happens if you define the
@llvm.global_ctors as two different types in two different modules and
then try linking them?

The front-end is required to make an array of type ["n" x { i32, void
()* }], so the bug is in the front-end.

Here's a patch which is trying to do that in the code generation. Let me know if I've done anything completely silly, otherwise I'll start using this for registering things that call the Objective-C runtime functions (not tested - it's almost midnight here, and I need to sleep before writing more code).

David

clang.diff (3.26 KB)

Sorry for the delay David, your patch looks great, I've applied it here:
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20080310/004690.html

Thanks!

-Chris