Vtable code generation options

Hey,

First of all sorry if this mail should be targeted to some other list, but seems like no one except Clang people can help with my issue. I’m using OS X environment, but probably it is common to any ELF-based system.

It is about how virtual tables are generated and which link relocations are needed to make them work. I have a very simple example that creates vtable with one slot for pure virtual:

class Foo {
public:
virtual void bar() = 0;
};

class Baz : public Foo {
public:
virtual void bar() override;
};

void Baz::bar() {}

void xyz() { Baz().bar(); }

Compiler produces next vtable layout for Foo:

__ZTV3Foo:
.quad 0
.quad __ZTI3Foo
.quad ___cxa_pure_virtual

It is easy to find that linker will produce absolute relocation for symbol ___cxa_pure_virtual. In most cases this relocation will be proceeded by dynamic linker. Lets now assume that our loadable module has a lot of similar vtables for miscellaneous classes. Each slot for pure method will cause new absolute relocation for the same symbol and we end up with binary that contains a huge bunch of relocation entries that refer to the same symbol. E.g. here are few relocation entries for Android dynamic library built with clang-3.6 from Android NDK:


00332fe4 00030502 R_ARM_ABS32 00000000 __cxa_pure_virtual
00332fe8 00030502 R_ARM_ABS32 00000000 __cxa_pure_virtual
00332fec 00030502 R_ARM_ABS32 00000000 __cxa_pure_virtual
00332ff0 00030502 R_ARM_ABS32 00000000 __cxa_pure_virtual
00332ff4 00030502 R_ARM_ABS32 00000000 __cxa_pure_virtual
00332ff8 00030502 R_ARM_ABS32 00000000 __cxa_pure_virtual
00332ffc 00030502 R_ARM_ABS32 00000000 __cxa_pure_virtual
003330f4 00030502 R_ARM_ABS32 00000000 __cxa_pure_virtual
003330f8 00030502 R_ARM_ABS32 00000000 __cxa_pure_virtual
003330fc 00030502 R_ARM_ABS32 00000000 __cxa_pure_virtual

I guess such situation is pretty common for large projects. If dynamic linker is not clever enough to detect such situation - it will waste time with lookups for the same symbol over and over.

Is compiler able to avoid emitting of absolute relocation for this case? Maybe it may introduce some lightweight shim function that indeed will call __cxa_pure_virtual via jump slot. These shims may be mergeable, thus final binary will contain only one shim instance and each vtable would need only relative relocation without expensive symbol lookup, that now is needed only once. Also as far as I see - such approach should incur any significant speed or size regression for generated code since pure virtual stub won’t be called often (probably no more than once, if any). At the same time dynamic linking may be performed faster.

Does Clang support similar approach at the moment? Or there are some downsides for described approach and it can not be implemented at all?

ELF will use a single symbol table entry for all of these relocations. By design, that makes it quite easy for the loader to only perform the lookup once. If we assume that the loader does its job efficiently, introducing an image-local thunk would only actually be beneficial in a pre-linked image. That could still be a reasonable feature request — "I'm going to prelink this image and would like the compiler + linker to generate code that optimizes load times with that in mind" — but I believe prelinking is somewhat disfavored, so you might have trouble getting traction on it.

Now, if you're actually profiling your loads and seeing redundant lookups — either because the linker is emitting redundant symbol table entries or because the loader isn't taking advantage of the entries being unique — that does sound like a bug, but it's a linker or loader bug, not a compiler bug.

John.