Appending linkage

This is an issue I raised a while ago, but I wanted to know if the situation had improved at all recently.

The issue is determining the number of elements in an array that has appending linkage.

The obvious course would be to use a constant GEP. Suppose we have N modules, each containing a global variable with appending linkage whose initializer is an array of size M containing elements of type T. Within each module the type of the variable should be [M x T]. After linking, the appended array would have N * M elements, so it’s type should be [(NM) x T]*. Therefore a GEP of indices (0, 1) ought to dereference the pointer and point to the end of the array. Calculating the array size would be a simple matter of subtracting the pointer to the end from the pointer to the beginning.

However, this does not work. The GEP always behaves as if it were operating on the array before appending - so the GEP gives you back a pointer to the Mth element, not the NxMth element. Note that I also tried just GEP (1), which gives a similar result, although I would expect it not to work at all, since pointers are only supposed to be dereferenced with an index of 0.

When I look at the generated bitcode, I see that the appended array does indeed have a type of [(NM) x T], but the “end” pointer initializer bitcasts the result to [1 x T] (if I use GEP 0 1) or [M x T] (if I use GEP 1) before doing the GEP. And I did not put that cast there.

The previous time I brought this up, there were several solutions suggested, all of which involved writing a custom link pass that would either place a sentinel value at the end of the array, or would store the count of array elements in some other variable. Well, I did this, but it only works for arrays that the linker has special knowledge of - that is, I don’t want to just blindly start modifying every appending linkage symbol, so I have to have some means to tell the linker which arrays require special processing. For now it’s just a list of hard-coded symbol names, but that means that every time I add another appending linkage array to the runtime library, I have to go and modify the linker program.

I’d like to find a more general solution to this problem. Or at least have someone tell me that I am doing something wrong.

I bumped up against the same problem, and ended up using linked lists
instead. I'd have a pointer to the beginning of the list, and each
module that added to the list would update that pointer in its global
init.

I suppose I could have used an appending array and had the global init
for each module that appended add to a length field defined alongside
it.

I think appending linkage works the same way with the ordinary
(native) linker. How do C programmers usually work with it?