Statically Initialized Arrays

I am trying to generate LLVM code that references a global array. Unfortunately, while I am generating the code I do not know the final length of the array, as I add elements to it as my compiler goes along. What is the best way to do this in LLVM? Ideally, I would be able to define the GlobalVariable array and change its length later on. I would love for it to have the correct length so I can leverage some of LLVM's static checking.

I've been thinking that one level of indirection would solve the problem for me. What I could do is reference the global array via a function call while generating the code. Then, at the end, I would generate both the array and the function body:

int array[size] = { ... };

int* globalArray( int index )
{
  return &array[index];
}

Ideally, LLVM could then inline the function. Is there a better way to arrange for this to occur? Should I not care about array lengths and just have my global variable be a pointer?

Thanks for the advice,

Evan Jones

I am trying to generate LLVM code that references a global array. Unfortunately, while I am generating the code I do not know the final length of the array, as I add elements to it as my compiler goes along. What is the best way to do this in LLVM? Ideally, I would be able to define the GlobalVariable array and change its length later on. I would love for it to have the correct length so I can leverage some of LLVM's static checking.

I've been thinking that one level of indirection would solve the problem for me. What I could do is reference the global array via a function call while generating the code. Then, at the end, I would generate both the array and the function body:

I would suggest something like this.

1. Create the global variable array with size 0.
2. Use an std::vector<Constant*> to accumulate all of the initializers for
    the array. When you need a reference to the array, use the global
    created by #1.
3. When you have the final array, create a *new* global with the correct
    size and the initializer formed from the vector.
4. Replace the old GV with the new GV using code that looks like this:

    OldGV->replaceAllUsesWith(ConstantExpr::getCast(NewGV, OldGV->getType());
    OldGV->eraseFromParent();

At the end of this, any instructions or other globals that referenced the temporary global will now reference the new one.

int array[size] = { ... };

int* globalArray( int index )
{
  return &array[index];
}

Ideally, LLVM could then inline the function.

You could do something like this, but it's more indirect than the above process.

Is there a better way to arrange for this to occur? Should I not care about array lengths and just have my global variable be a pointer?

If making the global a pointer works for you, that's certainly an option. The problem is that you can't have a static initializer in that case, requiring you to make some other global to hold the initializer. This also adds an extra level of indirection at runtime that you might not want.

-Chris

Ah ha! I was looking for something like this. Why didn't I see that there? I must be blind.

In a vaguely related node, why does Module::getGlobalVariable *not* return types with internal linkage? There must be some logic behind that choice that I can't figure out. It is easy to copy the code out of Module.cpp if you need to find variables with internal linkage, but it seems unnecessary to me.

Thanks,

Evan

Good question. I think that this method was pulled out of the linker originally, which didn't want to link against internal symbols. Other clients (such as the lowergc pass) want to get the variable and the var must be an external symbol.

If you want to add a 'bool AllowInternal' option to the method, and default it to false, go for it. Please submit the patch to the llvmbugs list if you choose to do so.

Thanks!

-Chris