Possible problems due to 'GlobalVariable' linkage?

Hi everyone,
Can someone enlighten me as to what the rules are for when a GlobalVariable needs to have external linkage and when it can get away with having internal linkage?

For some context (NOTE: I do not intend this to be an XY-problem, just that I have a way to resolve the problem I’m having; also, this is not homework):

I’m working on trying to target MiniJava (from CSEP-501) to the LLVM IR and I have implemented dynamic polymorphism as follows: I have a set of global variables, each containing a list of function pointers (so basically, these global variables serve as vtables). Whenever a MiniJava class is instantiated, an appropriate vtable is picked and its pointer is assigned to the first field in the instantiated class. I also have a function that is supposed to be called at the start of the program to populate the vtables with the pointers to the class methods.

The problem I’m having is that if I set the vtable variables to have internal linkage, verifyModule() fails (saying that the variables are external but do not have external or weak linkage) and I cannot dump object code (for reference, I use the code from Kaleidoscope for this). If I set the linkage to be external, I can dump object code but I cannot link it to my main stub (due to undefined references). And if I set the linkage to ‘weak external’, everything works - except that the resulting executable segfaults when run. It is worth pointing out that when I run the executable under LLDB, the segfault happens when the vtables are being populated (as opposed to, say, when a method is called using them).

I’m clearly not aware of the nuances of the different linkages but I suspect that they might not be the only problem here. Nevertheless, I would appreciate any light shed on this matter.

Regards,

Saad

I think of these in terms of C: a global symbol (a function or a global variable) is “external” linkage, it means it can be referenced outside of the current module. A static function or global variable is limited in scope to the current module and can’t be reference from outside.
Note that if there are no uses left of an internal symbol after optimization, the compiler will just delete it entirely and not emit it in the object.

That is indeed what I initially thought. And certainly what I want is for everything to be static (the only exception would be the ‘main’ function which, I should mention, takes no arguments and returns nothing, so I would expect it to ‘hide’ the rest of the code in the module). And yet verifyModule() tells me that the globals in the IR that I generate are external and should have the appropriate linkage.

So I guess what I’m wondering is why verifyModule() insists that the globals should have external linkage and, more importantly, if the linkage of the globals actually has any part in the final linked program segfaulting.

Can you quote the exact verifier failure?

The error is: Global is external, but doesn’t have external or weak linkage!.
Below that, it lists the type(s) and name(s) of the globals.

Edit: This is when I use anything other than ExternalLinkage or ExternalWeakLinkage as the linkage type. When I use either of those, verifyModule() gives the module the green signal.

The check is:

 Assert(!GV.isDeclaration() || GV.hasValidDeclarationLinkage(),
         "Global is external, but doesn't have external or weak linkage!", &GV);

Is seems like you hitting this because the global does not have a definition.

Thanks for the tip!

UPDATE: I created a minimal program that replicated my situation and I had the same problem. So I followed the implementations of hasValidDeclarationLinkage() and isDeclaration() and it turned out that a GlobalVariable instance created without an initializer is considered a declaration. So basically, I just needed to pass an initializer to the GlobalVariable constructor (or, alternatively, to GlobalVariable::setInitializer()), which would allow me to set the global variable to have internal linkage. I tried that with my reduced program and everything seems to work nicely :slight_smile:.

Just as a follow-up question: it would be nice to be able to initialize a global variable piecemeal (e.g. via a series of GEPs) instead of in bulk (i.e. by passing an initializer). Is there a way to do this?

In general I’ve been successful to try to express my concept in C/C++ and see how clang translate it in LLVM.

The initializer of a global has to be a constant, LLVM has a constant GEP that can operate only on constant expression, see here for example: https://github.com/llvm/llvm-project/blob/master/llvm/test/Other/constant-fold-gep.ll#L79

If you need something more flexible, you like have to use a global constructor (which runs on program initialization), see for example: Compiler Explorer
(the optimizer would constant fold this entirely, but clang emits a function to do the initialization).

In general I’ve been successful to try to express my concept in C/C++ and see how clang translate it in LLVM.

I’m starting to think I, too, should make a habit out of this :smiley:

That aside, thanks for the sample! It turns out that what I was looking for was ConstantAggregateZero (I have yet to try it out but I think it should work, assuming my intuitive understanding of what it does is correct). That said, I will spend some time down the line looking at the alternatives.