More questions about appending linkage

So, I was in the process of writing a custom LLVM pass (my first one, yay!) to handle the problem of measuring the length of an appending linkage array.

The approach I was taking was pretty simple: Iterate through all of the module’s global variables until I find one that is trying to calculate the end of an appending-linkage array, which looks like this:

@_moduleListEnd = constant [1 x %Module*]* getelementptr ([1 x %Module*]* bitcast ([62 x %Module*]* @_moduleList to [1 x %Module*]), i32 1) ; <[1 x %Module]**> [#uses=0]

Once I detected this particular pattern, I’d simply remove the bitcast instruction so that the GEP is operating on the real type of the array.

Unfortunately, this runs in to a couple problems. The first is that I don’t really understand how appending linkage interacts with dead global elimination. What I want is for the appending linkage array to only contain the variables that survived dead global elimination; And I don’t want the global references in the array itself to cause the referred global to be considered “live”.

It’s easy to envision if you think in terms of C++ static constructors: Say you are linking with a static library that has a large number of modules, some of which contains static constructor functions. You have some appending-linkage array which is used to collect together pointers to all of the functions that need to be executed before main(). However, you don’t want to include functions for classes that aren’t transitively reachable from your main function, otherwise you end up pulling in the entire library and your program gets bloated.

The other problem is that you want to fix up the appending linkage arrays after dead global elimination. Unfortunately, at that point the internalize pass has changed the linkage types of all your appending arrays to “internal”. So there’s no way for my trick to work, since it can’t tell which arrays to fix up.

In the case of static C++ constructors, my understanding is that the linker has special knowledge of that particular array. But I don’t want to hard-code the name of a particular global into my linker if I can help it. What I need is some way to mark or annotate those variables to tell the linker to perform the fixup.