Question about appending linkage

I'm trying to figure out how to do static initialization (like static constructors in C++ or Java). I figured I would use appending linkage - that is, for each module I'd generate a function that did all of the static initialization for that module, and then I'd put a pointer to that function in an array with appending linkage. Then my compiler-generated startup code would simply iterate through the array and call each function in turn, before calling my "main" method.

Only problem is - how do I know when I've reached the end of the array?

-- Talin

Talin wrote:

I'm trying to figure out how to do static initialization (like static
constructors in C++ or Java). I figured I would use appending linkage -
that is, for each module I'd generate a function that did all of the
static initialization for that module, and then I'd put a pointer to
that function in an array with appending linkage. Then my
compiler-generated startup code would simply iterate through the array
and call each function in turn, before calling my "main" method.

Only problem is - how do I know when I've reached the end of the array?
  

There are several ways to do this:

1) Link in a bytecode file at the end that has an array that ends with a NULL function pointer. The last time I checked, appending arrays appended arrays in link order, so the NULL will be at the end. LLVM does not define this behavior, so it may disappear in future releases.

2) Write a transform that either puts a NULL function pointer at the end or creates a global variable with the array size.

3) Put the array into its own section. Write a GNU ld linker script that defines symbols at the beginning and end of the section. You then write your code so that it loops until it hits the variable at the end of the section (the Linux 2.4 linker does this).

My personal preference would be option 2. The transform is trivial to write and does not rely on undefined behavior.

-- John T.

John Criswell wrote:

Talin wrote:
  

I'm trying to figure out how to do static initialization (like static
constructors in C++ or Java). I figured I would use appending linkage -
that is, for each module I'd generate a function that did all of the
static initialization for that module, and then I'd put a pointer to
that function in an array with appending linkage. Then my
compiler-generated startup code would simply iterate through the array
and call each function in turn, before calling my "main" method.

Only problem is - how do I know when I've reached the end of the array?
  

There are several ways to do this:

1) Link in a bytecode file at the end that has an array that ends with a NULL function pointer. The last time I checked, appending arrays appended arrays in link order, so the NULL will be at the end. LLVM does not define this behavior, so it may disappear in future releases.

2) Write a transform that either puts a NULL function pointer at the end or creates a global variable with the array size.
  

This assumes that I'm writing my own linker instead of using the llvm-link tool, right?

At the moment I'm writing out separate bitcode files for each source module, and then linking them together with llvm-link. In my current build process none of my code is involved in the link phase. Since the compiler doesn't know how many modules will get linked, it doesn't know the array size.

Talin wrote:

John Criswell wrote:
  

Talin wrote:

I'm trying to figure out how to do static initialization (like static
constructors in C++ or Java). I figured I would use appending linkage -
that is, for each module I'd generate a function that did all of the
static initialization for that module, and then I'd put a pointer to
that function in an array with appending linkage. Then my
compiler-generated startup code would simply iterate through the array
and call each function in turn, before calling my "main" method.

Only problem is - how do I know when I've reached the end of the array?

There are several ways to do this:

1) Link in a bytecode file at the end that has an array that ends with a
NULL function pointer. The last time I checked, appending arrays
appended arrays in link order, so the NULL will be at the end. LLVM
does not define this behavior, so it may disappear in future releases.

2) Write a transform that either puts a NULL function pointer at the end
or creates a global variable with the array size.

This assumes that I'm writing my own linker instead of using the
llvm-link tool, right?
  

Not necessarily. You could simply run your pass with the opt tool after using llvm-link to link the bytecode. That's what we do for the Linux kernel in the SVA project.

At the moment I'm writing out separate bitcode files for each source
module, and then linking them together with llvm-link. In my current
build process none of my code is involved in the link phase. Since the
compiler doesn't know how many modules will get linked, it doesn't know
the array size.
  

Right. You definitely need to do this after the linking phase, but you don't need to replace llvm-link; just run your pass via opt after llvm-link is done but before you do code generation.

-- John T.

My apologies for opening up an old thread, but I've been thinking about this quite a bit lately, and it seems to me that "appending linkage" would be a lot more useful in general if there was some way to determine the size of the appended array without having to write a custom pass to do it. In other words, using the standard llvm tools (llvm-link and friends), one ought to be able to determine the starting and ending address of the array after it has been concatenated together.

While it is true that I can solve the problem for my own special case using the steps outlined below, it seems to me that anyone using appending linkage for anything is going to run up against a similar problem - how to know where the end of the array is. Thus it makes sense to me to try and solve the general case instead of just my own particular need.

Unfortunately I can't think of any reasonable way to do it that doesn't involve polluting the IR language with a lot of special cases. Here are some ideas that I thought of and discarded:

* Have a special "PostAppendingLinkage" type which acts just like "AppendingLinkage", except that it is guaranteed to come after all "AppendingLinkage" sections have been appended. Thus, you could use this to nail a sentinel value at the end of the list. Rejected because the linker has no control over the order in which modules are processed; It would have to keep the two linkage types separate until the last possible step.

* Have some way to define a global symbol representing the ending address, rather than the starting address, of a declaration. Sort of like GetElementPtr(1), except evaluated at link time. Still, it feels ugly and hard to implement.

-- Talin

John Criswell wrote:

I tried to write a simple program to use appending globals
to get a feel for how they work.
(See attached.)
I intend it to print out:
x!
y!
z!
in any order.
However, when I compile it, I get
...
  %lt = icmp ult i32 %counter, udiv (i32 sub (i32 ptrtoint ([1 x %0]*
getelementptr ([1 x %0]* bitcast ([3 x %0]* @abc to [1 x %0]*), i32 1) to i32),
i32 ptrtoint ([3 x %0]* @abc to i32)), i32 sub (i32 ptrtoint (%0* getelementptr
([3 x %0]* @abc, i32 0, i32 1) to i32), i32 ptrtoint ([3 x %0]* @abc to i32)))
; <i1> [#uses=1]
...
The part to note is that all of them become [3 x %0]*, except the first
is bitcast to [1 x %0]*. I was trying to determine the length
by doing sizeof(x)/sizeof(x[0]), as recommended in the documentation.
But, that extra bitcast breaks it, causing it to only print:
x!
Is this bitcast intentional? If so, is there a way to determine
the length? It seems that doing

  %abcnext = getelementptr [1 x {i32, void()*}]* @abc, i32 1
  %abcnexti = ptrtoint [1 x {i32, void()*}]* %abcnext to i32

should get the pointer to after @abc, not the pointer
to what was the end of it in just that file.
It seems like a bug.

- --
++++++++++++[->+++++++>+++++++++<<]>-.>[->+>+>+>+>+
<<<<<]>++++++++.>-------.<--.>>.---.>++.>-----.>+++
+++++[->++++<]>.<<<<<<<<.>>++.>.>.>.>>++++++++++.

a.ll (658 Bytes)

b.ll (1.35 KB)