GCC and Clang produce undefined references to functions with vague linkage

Note1: I am not subscribed to the gcc list, please use reply-all.
Note2: I think the clang list is moderated for the first post, but it
is usually really fast. Sorry about that.

I recently implemented an optimization in LLVM to hide symbols that
the we "know" are available in every DSO that uses them. This is very
similar to a more generic optimization that is possible during LTO
when using recent versions of the gold plugin.

Unfortunately, this found a bug in both gcc and clang (or in the
itanium ABI, it is not very clear). The testcase is

$ cat test.h
struct foo {
  virtual ~foo();
};
struct bar : public foo {
  virtual void zed();
};
$ cat def.cpp
#include "test.h"
void bar::zed() {
}
$ cat undef.cpp
#include "test.h"
void f() {
  foo *x(new bar);
  delete x;
}

Both gcc (>= 4.6) and clang will produce and undefined reference to
_ZN3barD0Ev when compiling undef.cpp with optimizations:

$ ~/gcc/build/gcc/xgcc -B ~/gcc/build/gcc/ -c undef.cpp -o undef.o -O3 -fPIC
$ nm undef.o | grep D0
                 U _ZN3barD0Ev

And, when using LTO, a shared library built from def.cpp has a visible
vtable, but the destructor itself is not visible:

$ ~/gcc/build/gcc/xgcc -B ~/gcc/build/gcc/ -c def.cpp -o def.o -O3 -flto -fPIC
$ ~/gcc/build/gcc/xgcc -B ~/gcc/build/gcc/ def.o -o def.so -shared
-fuse-linker-plugin

$ readelf -sDW def.so | grep bar
   12 7: 0000000000000931 5 OBJECT WEAK DEFAULT 13 _ZTS3bar
   11 10: 0000000000001ca0 40 OBJECT WEAK DEFAULT 24 _ZTV3bar
   13 14: 0000000000001cd0 24 OBJECT WEAK DEFAULT 24 _ZTI3bar
    8 15: 00000000000008b0 10 FUNC GLOBAL DEFAULT 11 _ZN3bar3zedEv

And the final result is that we get an undefined reference when linking:

$ g++ -shared -o foo.so undef.o def.so -Wl,-z,defs
undef.o:undef.cpp:function f(): error: undefined reference to 'bar::~bar()'

I can see two ways of solving this and would like for both clang and
gcc to implement the same:

* Make sure the destructor is emitted everywhere. That is, clang and
gcc have a bug in producing an undefined reference to _ZN3barD0Ev.
* Make it clear that the file exporting the vtable has to export the
symbols used in it. That is, the Itanium c++ abi needs clarification
and so does gcc's lto plugin (and the llvm patch I am working on).

In an earlier discussion on the llvm list, it was pointed out that
* The second solution would probably be the most backward compatible
one, as gcc (even without lto) has been producing the undefined symbol
since 4.6.
* If implementing the second solution, the destructor still has to be
a weak symbol on OS X at least.
* The first solution has the advantage that it is semantically simpler
as it avoids a special case about when a function can be assumed to be
available externally.
* The first solution has the advantage that more symbols can be marked
hidden, as the gcc lto plugin does currently.

Cheers,
Rafael

[ problem with visibility for bar::~bar for testcase ]

$ cat test.h
struct foo {
  virtual ~foo();
};
struct bar : public foo {
  virtual void zed();
};
$ cat def.cpp
#include "test.h"
void bar::zed() {
}
$ cat undef.cpp
#include "test.h"
void f() {
  foo *x(new bar);
  delete x;
}

...

I can see two ways of solving this and would like for both clang and
gcc to implement the same:

[1] * Make sure the destructor is emitted everywhere. That is, clang and
gcc have a bug in producing an undefined reference to _ZN3barD0Ev.
[2] * Make it clear that the file exporting the vtable has to export the
symbols used in it. That is, the Itanium c++ abi needs clarification
and so does gcc's lto plugin (and the llvm patch I am working on).

I think that the second solution wins because it allows for the production
of less object code, and it is consistent with the rationale for the
vtable optimization rule (the vtable is emitted by the file that has the
definition for the first non-inline virtual function; simply do the same
for the auto-generated virtual destructor). The first solution requires
making one copy per compilation unit and eliminating the duplicates at
link time.

But that's pervasively true in C++ — the linker has to eliminate duplicates
all the time. Idiomatic C++ code ends up plunking down hundreds, if
not thousands, of inline functions in every single translation unit. This is
already a massive burden for linking C++ programs, particularly in debug
builds. Adding a few extra symbols when the optimizer determines that
it can devirtualize, but declines to inline, is essentially irrelevant.

In fact, it is particularly unimportant because it's very likely that this duplicate
will be in the same DSO as the vtable. That means that the first solution
imposes some extra work on the static linker alone (but again, only when
devirtualizing a call to a function we don't want to inline!) while preserving
our ability to reduce work for the dynamic linker (since calls do not rely
on address equality of the function across translation units). The second
solution is an irrevocable guarantee that every symbol mentioned in
a strong vtable *must* be exported to the whole world.

Also recall that these symbols can already be emitted in arbitrary
other translation units — we cannot change the ABI to say that these
symbols are *only* emitted in the file defining the v-table.

Finally, both the language standard and the ABI are clearly designed
around an assumption that every translation unit that needs an inline
function will emit it.

John.

Richi asked me to also report a gcc bug:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53808

But that's pervasively true in C++ — the linker has to eliminate duplicates
all the time. Idiomatic C++ code ends up plunking down hundreds, if
not thousands, of inline functions in every single translation unit. This is
already a massive burden for linking C++ programs, particularly in debug
builds. Adding a few extra symbols when the optimizer determines that
it can devirtualize, but declines to inline, is essentially irrelevant.

In fact, it is particularly unimportant because it's very likely that this duplicate
will be in the same DSO as the vtable. That means that the first solution
imposes some extra work on the static linker alone (but again, only when
devirtualizing a call to a function we don't want to inline!) while preserving
our ability to reduce work for the dynamic linker (since calls do not rely
on address equality of the function across translation units). The second
solution is an irrevocable guarantee that every symbol mentioned in
a strong vtable *must* be exported to the whole world.

Also recall that these symbols can already be emitted in arbitrary
other translation units — we cannot change the ABI to say that these
symbols are *only* emitted in the file defining the v-table.

I would just like to point out that the ABI could says that they only
*need* to be emitted in the file with the vtable, but yes, for a long
time it would have to support the symbols showing up in other
translation units produce by older compilers.

Finally, both the language standard and the ABI are clearly designed
around an assumption that every translation unit that needs an inline
function will emit it.

John.

Cheers,
Rafael

There's no "for a long time" here. The ABI does not allow us to emit these
symbols with non-coalescing linkage. We're not going to break ABI
just because people didn't consider a particular code pattern when they
hacked in devirtualization through external v-tables.

John.

There's no "for a long time" here. The ABI does not allow us to emit these
symbols with non-coalescing linkage. We're not going to break ABI
just because people didn't consider a particular code pattern when they
hacked in devirtualization through external v-tables.

If we take "the ABI" to mean compatibility with released versions of
the compilers, then it *is* broken, as released compilers assume
behavior that is not guaranteed by the ABI (the document). It is not
possible to avoid all incompatibilities. To avoid most we would have
to

* Say that the symbol must be exported by the file that exports the
vtable. Not doing so causes incompatibilities with files compiled by
gcc version 4.6 and 4.7, open source clang versions 2.9, 3.0, 3.1
and Apple's releases 318.0.58 and 421-10.48 (those are the ones I have
on my laptop).
* Say that the symbol must remain mergeable so that current files
still work, which on OS X requires it to be weak.

In other words, we have to take the worse of both options. The only
thing that would still be broken (in released versions) are libraries
built with gcc's LTO, but hopefully changing that is the option with
the least impact on the users.

John.

Cheers,
Rafael

By "breaking the ABI", I mean changing the required output of
compilers that conform to it. It is understood that compilers will
occasionally have bugs that cause them to not conform; as a
somewhat trivial example, both gcc and clang have mis-mangled
symbols in the past. Typically, compiler implementers choose to fix
those bugs, rather than trying to codify them by modifying the ABI.

Again, the ABI clearly expects that every object file that references
an inline function will define it, and do so with coalescing linkage.
That is an invaluable invariant. Our proposed visibility-modifying
optimization is hardly the only reasonable consumer of it.

As with most compatibility problems, it would be better if no compiler
had ever strayed from the One True Path, and yet it's happened.
Given that the chance of actual incompatibility in practice is low —
as I believe I've argued several times — I continue to believe that
the proper response is to just fix the bug, rather than imposing a
novel and permanent constraint on ourselves out of unmotivated worry.

Also, as a point of order: when you said you were going to make
a proposal to the GCC list, I was under the impression that we'd
reached some level of consensus.

John.

Quoting John McCall <rjmccall@apple.com>:

not well-formed C++, for it violates the one-definition rule in that it
*lacks* a definition for the virtual member function foo::~foo(). Does
it make any difference if you add a definition?

not well-formed C++, for it violates the one-definition rule in that it
*lacks* a definition for the virtual member function foo::~foo(). Does
it make any difference if you add a definition?

Unfortunately no. Replacing the declaration with an inline definition
produces a copy of it in undef.o, but we still get an undefined
reference to ~bar:

nm undef.o | grep D0
                 U _ZN3barD0Ev
0000000000000000 W _ZN3fooD0Ev

Cheers,
Rafael

Yes, this indeed looks like (most probably my) bug in the constant folding
code that now uses extern vtables. I will fix it. So we can not take
comdat linkage decl from external vtable when we no longer have its body
around, right?

Sounds about the fix John was describing, yes. You can produce a copy
of the comdat linkage decl (the destructor in this case) or avoid
using the extern vtable and just produce an undefined reference to the
vtable itself.

Honza

Thanks!
Rafael