"anchor" method policy, request for clarification

I tried to build LLVM with ’gcc-4.8.1 –flto’, and when linking unittest programs, I got the following error:

/tmp/cc8pMk84.ltrans30.ltrans.o:(.data.rel.ro._ZTIN4llvm2cl15OptionValueCopyISsEE[_ZTIN4llvm2cl15OptionValueCopyISsEE]+0x10): undefined reference to `typeinfo for llvm::cl::GenericOptionValue’

/tmp/cc8pMk84.ltrans30.ltrans.o:(.data.rel.ro._ZTIN4llvm2cl15OptionValueCopyINS0_13boolOrDefaultEEE[_ZTIN4llvm2cl15OptionValueCopyINS0_13boolOrDefaultEEE]+0x10): undefined reference to `typeinfo for llvm::cl::GenericOptionValue’

This seems to be due to the (mis-)use of “anchor” methods in CommandLine. I commited r183830, with a removal of the anchor method for GenericOptionValue, which solved my link problem. Also, the commit didn’t change the number of vtables, as measured by:

find -name *.o | xargs nm -C | grep ‘V vtable’ | grep OptionValue | wc

The current LLVM developer policy text regarding “anchor” methods is:

“If a class is defined in a header file and has a vtable (either it has virtual methods or it derives from classes with virtual methods), it must always have at least one out-of-line virtual method in the class. Without this, the compiler will copy the vtable and RTTI into every .o file that #includes the header, bloating .o file sizes and increasing link times.”

I don’t know much about when vtable copies are generated, and I guess it may vary slightly between implementations. However, according to my gut feeling, this is an insufficient description.

First, using only #include (only defining the class) seems insufficient to generate a copy of the vtable. I thought that a class instantiation (creating a class object) was necessary. Specifically, in my case above, having an anchor method for an abstract class seems wrong, because abstract classes are never instantiated. The policy text needs a more detailed description, telling when an (unwanted) vtable is generated.

(Secondly, how to enforce this rule for class templates. Should the anchor method be defined at each template instantiation? This seems like, for a single anchor declaration in a header file, quite a lot of anchor method definitions may be necessary, over many different translation units.)

/Patrik Hägglund

(+Chris, since I assume he wrote this policy - and, as I said in my
previous email, I wouldn't mind seeing some justification or just
seeing the rule go away & drop the anchors I added previously (or, if
we're going to keep it, we could add more anchors & actually get to
the point where we're -Wweak-vtable clean & enable that warning))

I tried to build LLVM with ’gcc-4.8.1 –flto’, and when linking unittest
programs, I got the following error:

/tmp/cc8pMk84.ltrans30.ltrans.o:(.data.rel.ro._ZTIN4llvm2cl15OptionValueCopyISsEE[_ZTIN4llvm2cl15OptionValueCopyISsEE]+0x10):
undefined reference to `typeinfo for llvm::cl::GenericOptionValue'

/tmp/cc8pMk84.ltrans30.ltrans.o:(.data.rel.ro._ZTIN4llvm2cl15OptionValueCopyINS0_13boolOrDefaultEEE[_ZTIN4llvm2cl15OptionValueCopyINS0_13boolOrDefaultEEE]+0x10):
undefined reference to `typeinfo for llvm::cl::GenericOptionValue'

This seems to be due to the (mis-)use of “anchor” methods in CommandLine.

What's the misuse you have in mind? Can you explain why the anchor
wasn't being linked in? It sounds like the unit test wasn't linking to
all the libs it should've been. (& that removing the anchor was simply
a workaround to avoid linking to an implementation file...)

I
commited r183830, with a removal of the anchor method for
GenericOptionValue, which solved my link problem. Also, the commit didn’t
change the number of vtables, as measured by:

find -name \*.o | xargs nm -C | grep 'V vtable' | grep OptionValue | wc

Where did you run this command? Across the whole LLVM build?

It's possible that, even with a weak vtable, you only get one copy of
it if, as you mention below, you only create instances of that type in
one translation unit.

The current LLVM developer policy text regarding “anchor” methods is:

“If a class is defined in a header file and has a vtable (either it has
virtual methods or it derives from classes with virtual methods), it must
always have at least one out-of-line virtual method in the class. Without
this, the compiler will copy the vtable and RTTI into every .o file that
#includes the header, bloating .o file sizes and increasing link times.”

I don’t know much about when vtable copies are generated, and I guess it may
vary slightly between implementations. However, according to my gut feeling,
this is an insufficient description.

The simplest model would be that they work the same way as inline
functions - the compiler must emit a copy in every TU that calls the
function (& the compiler must emit the vtable in every TU that 'uses'
the vtable - which, as you point out, is any TU that constructs an
instance of the class)

First, using only #include (only defining the class) seems insufficient to
generate a copy of the vtable.

Correct.

I thought that a class instantiation
(creating a class object) was necessary.

Yes, that would be sufficient.

Specifically, in my case above,
having an anchor method for an abstract class seems wrong, because abstract
classes are never instantiated.

Well they are, though indirectly - when you construct a derived object
you must also construct the base subobject - and initialize the vtable
pointer to point to the base's vtable (which is then replaced with the
derived vtable). You should be able to verify/test this.

The policy text needs a more detailed
description, telling when an (unwanted) vtable is generated.

The policy is pretty simple & doesn't necessarily need to describe
when this happens, simply that you should always have an anchor for
any class with a vtable.

(Secondly, how to enforce this rule for class templates. Should the anchor
method be defined at each template instantiation? This seems like, for a
single anchor declaration in a header file, quite a lot of anchor method
definitions may be necessary, over many different translation units.)

I think it's reasonable to assume that for templates the rule doesn't
apply because it's just too inconvenient to go & write explicit
instantiations of an anchor for every template specialization. In the
same way that every member function of a template has to be
deduplicated by the linker, I assume it's acceptable for vtables to be
treated the same.

- David

Can you explain why the anchor wasn't being linked in? It sounds like the unit test wasn't linking to

all the libs it should've been.

No (libLLVMSupport.a is linked in). The error message below is about (using c++filt) 'typeinfo for llvm::cl::OptionValueCopy<std::basic_string<[...]>>', and 'typeinfo for llvm::cl::OptionValueCopy<llvm::cl::boolOrDefault>', using an undefined reference to `typeinfo for llvm::cl::GenericOptionValue'. My guess was because GenericOptionValue is an abstract class.

Where did you run this command? Across the whole LLVM build?

Yes.

The policy is pretty simple & doesn't necessarily need to describe
when this happens, simply that you should always have an anchor for
any class with a vtable.

I think that the confusion that this discussion is caused by, clearly shows the benefit of having a better description. Also, your exception for class templates needs to be expressed. In this example (for GenericOptionValue, and derived classes), I suppose it makes a difference.

/Patrik Hägglund