Special cased global-to-local-in-main replacement in GlobalOpt


GlobalOpt has an interesting special-case optimization for globals that are only accessed within “main”. These globals are replaced by allocas within the “main” function (and the GV itself is deleted). The full condition for this happening is:

// If this is a first class global and has only one accessing function
// and this function is main (which we know is not recursive we can make
// this global a local variable) we replace the global with a local alloca
// in this function.
// NOTE: It doesn’t make sense to promote non single-value types since we
// are just replacing static memory to stack memory.
// If the global is in different address space, don’t bring it to stack.
if (!GS.HasMultipleAccessingFunctions &&
GS.AccessingFunction && !GS.HasNonInstructionUser &&
GV->getType()->getElementType()->isSingleValueType() &&
GS.AccessingFunction->getName() == “main” &&
GS.AccessingFunction->hasExternalLinkage() &&
GV->getType()->getAddressSpace() == 0) {

From today’s discussion on IRC, there appear to be two problems with this approach:

  1. The hard-coding of “main” to mean “entry point to the code” that only dynamically runs once.
  2. Assuming that “main” cannot be recursive (in the general sense).

(1) is a problem for non-traditional compilation flows such as simply JIT of freestanding code where “main” is not the entry point; another case is PNaCl, where “main” is not the entry point ("_start" is), and others where parts of the runtime environment are included in the IR together with the user code. This is not the only place where the name “main” is hard-coded within the LLVM code base, but it’s a good example.

(2) is a problem because the C standard, unlike the C++ standard, says nothing about “main” not being recursive. C++11 says in 3.6.1: “The function main shall not be used within a program.”. C does not appear to mention such a restriction, which may make the optimization invalid for C.

A number of possible solutions were raised: some sort of function attribute that marks an entry point, module-level entry point, documenting that LLVM assumes that the entry point is always renamed to “main”, etc. These mostly address (1) but not (2).

Any thoughts and suggestions are welcome.