Undefined symbol when used as global, not from function

I have some simple code like:

target triple = "x86_64-pc-windows-msvc"

@_ZTVN6System8Sysutils9ExceptionE = external dllimport global [16 x i8*]
@"hahaha" = private constant { i8* } { i8* bitcast ([16 x i8*]* @_ZTVN6System8Sysutils9ExceptionE to i8*) }

define { i8* }* @m2() {
  ret { i8* }* @"hahaha"
}


define [16 x i8*]* @mainCRTStartup() {
  ret [16 x i8*]* @"_ZTVN6System8Sysutils9ExceptionE"
}
> llc -filetype=obj -O0 caller.ll
> lld -flavor link  caller.obj import.lib /demangle:no /subsystem:console
lld: error: undefined symbol: _ZTVN6System8Sysutils9ExceptionE
>>> referenced by caller.obj:(.rdata)

the importlib is a simple coff import file:

llvm-objdump.exe import.lib -a -t

import.lib(rtl280.bpl): file format COFF-import-file

--------- 0/0 64 Thu Jan 1 01:00:00 1970 rtl280.bpl
[ 0](sec 0)(fl 0x00)(ty 0)(scl 0) (nx 0) 0x00000000 __imp__ZTVN6System8Sysutils9ExceptionE

actual file is on onedrive (no way to upload it here):

What am I doing wrong? Shouldn’t LLD support dllimport variables inside a global?

The problem lies in the data symbol (@"hahaha"?); this is supposed to be initialized with the address of the dllimported symbol itself. The reference within main is fine (that one references __imp__ZTVN6System8Sysutils9ExceptionE), but .rdata references _ZTVN6System8Sysutils9ExceptionE directly.

$ llvm-objdump -r caller.obj 

caller.obj:	file format COFF-x86-64

RELOCATION RECORDS FOR [.text]:
0000000000000003 IMAGE_REL_AMD64_REL32 .rdata
0000000000000013 IMAGE_REL_AMD64_REL32 __imp__ZTVN6System8Sysutils9ExceptionE

RELOCATION RECORDS FOR [.rdata]:
0000000000000000 IMAGE_REL_AMD64_ADDR64 _ZTVN6System8Sysutils9ExceptionE

But unfortunately, with how DLL linkage works, the runtime linker can’t and won’t fill in the absolute address of _ZTVN6System8Sysutils9ExceptionE (which exists in a different DLL, loaded anywhere in the address space at runtime) anywhere in the image - it only does that for the IAT entries (which is where __imp__ZTVN6System8Sysutils9ExceptionE points).

Consider the equivalent C code:

extern __declspec(dllimport) int _ZTVN6System8Sysutils9ExceptionE;
int *myPointer = &_ZTVN6System8Sysutils9ExceptionE;

Compiling this errors out like this:

$ clang -target x86_64-windows-gnu -S -o - caller.c
caller.c:2:18: error: initializer element is not a compile-time constant
int *myPointer = &_ZTVN6System8Sysutils9ExceptionE;
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

However, if this is changed into C++ code instead, this is changed to generate runtime code to initialize the pointer:

$ clang -target x86_64-windows-gnu -S -o - -x c++ caller.c
[...]
__cxx_global_var_init:
	movq	__imp__ZTVN6System8Sysutils9ExceptionE(%rip), %rax
	movq	%rax, myPointer(%rip)
	retq

_GLOBAL__sub_I_caller.c:
	subq	$40, %rsp
	callq	__cxx_global_var_init
	nop
	addq	$40, %rsp
	retq

	.bss
	.globl	myPointer
myPointer:
	.quad	0

	.section	.ctors,"dw"
	.quad	_GLOBAL__sub_I_caller.c

(The same thing happens for -msvc triples, but the C++ symbols are just a bit less readable in that form.)

FWIW, within mingw environments, there’s a concept of runtime pseudo relocs; when the linker notices that there are references to an undefined symbol foo, but the symbol __imp_foo exists, then it adds the locations where these need to be fixed up to a list, and a mingw runtime function runs over them and fixes up the references on startup - kinda like what the C++ case generated, but with a list produced by the linker, handled by a fixed runtime function.

That mechanism is mainly for cases when the data symbol wasn’t marked as dllimport in the first place, and the linker is stuck with references as if the symbol wasn’t imported, but turns out to be. But for the C example case above, the compiler already knows that this can’t work, and refuses to emit code for it.

Thanks that explains it.

Is there an easy way to tell LLD to emit these (preferably without changing the triple) too?

I can probably interpret them myself.

Yes, the triple isn’t tied to this in itself. If you pass -lldmingw to lld-link, you get these behaviours (and a couple other things). It shouldn’t in general be detrimental to do that - it mostly opts in to a more relaxed behaviour wrt some details. If you don’t want to opt in to the whole of mingw behaviours in the linker, it should be possible to only add -auto-import and -runtime-pseudo-reloc to the options instead, which should give you only these behaviours but not everything else from -lldmingw. (Note that this combination isn’t very much tested in the wild though.)

If you do that, you’ll need to do essentially the same as what mingw-w64/pseudo-reloc.c at master · mingw-w64/mingw-w64 · GitHub does (although you can skip support for the legacy v1 format of pseudo relocations).

However I kinda wonder if it wouldn’t be simpler to just do what Clang does for the C++ cases, where it generates code for a runtime constructor which initializes the pointers. (But I’m not familiar with implementing such things from scratch so I have no idea how big effort that is.)

Thanks! I’ll try the “fixup” constructor approach first