Why '--start-group' & '--end-group' suppress the duplicate symbol error?

Hi all,

Given the following snipptes.

// test.h
struct A { A(); int get(); };

// i18n_api/test.cpp
#include "test.h"
// extern int foo_defined_at_api();
// int A::get() { return foo_defined_at_api(); }
A::A() {}
int A::get() { return 2; }

// api/test.cpp
#include "test.h"
A::A() {}
int A::get() { return 1; }
// int foo_defined_at_api() { return 0; }

// main.cpp
#include "api/test.h"
#include <iostream>
int main() {
  A a;
  std::cout << a.get() << std::endl;
}

Build step as following snippets shows.

$ clang++ -c api/test.cpp -g -o api/test.o
$ clang++ -c i18n_api/test.cpp -g -o i18n_api/test.o
$ ar -rcs api/libtest.a api/*.o
$ ar -rcs i18n_api/libtest.a i18n_api/*.o
$ clang++ -fuse-ld=lld -Wl,--start-group i18n_api/libtest.a api/libtest.a -Wl,--end-group main.o

There are duplicate symbols from i18n_api/libtest.a and api/libtest.a, but lld give no error message here. If I deliberately make an undefined symbol, foo_defined_at_api, to force api/libtest.a to be extracted, can trigger the duplicate symbol error.

Is this reasonable?

Henry Wong

I’d be slightly surprised to see a duplicate symbol error even without the --start-group/–end-group options. Unless something’s changed recently, library members are only pulled in from archives if they are needed to fulfil an undefined symbol, and the first such fulfilment is picked, if there are multiple candidates. As I see it, you have two undefined symbols in main.o:

A::A()
int A::get();

Both i18n_api/test.o and api/test.o define both symbols. As such, the test.o member in the first such library (i18n_api/libtest.a) will fulfil this definition and be used in the link. The member in the other library is never linked in, because there are no more undefined references to resolve.

By adding the foo_defined_at_api reference to the test.o in i18n_api/libtest.a, first, that test.o will be linked in as above, but then the test.o member of api/libtest.a now has to be linked in too, to fulfil the new undefined reference, causing a duplicate symbol error. If, on the other hand, you swapped around the two libraries in the link order, you wouldn’t get it, because first the api/libtest.a member will be linked in, but then not have any undefined references, so the other one isn’t.

1 Like

Thank you for your detailed explanation! I have dived into the lld implementation about LazyFile/LazyObject, and get clear about the whole link process based on your explanation.

What puzzles me is whether lld should report the duplicate symbol error. The reason why I put forward this topic is that I encountered a duplicate symbol linker error on a super large project when I added an undefined entry to some archives, just like the example I gave at the beginning.

I know of many example codebases where if the linker were to start doing that, you’d end up with a broken build, so I don’t think so. This behaviour is well-established, and is perhaps the major difference between linking using archives and using the objects directly on the command-line.

@MaskRay might have more to say on this matter though.

2 Likes

This behaviour is well-established, and is perhaps the major difference between linking using archives and using the objects directly on the command-line.

Right. Here are a couple of posts that explain this behavior, which is fundamental to how UNIX linkers work with archive libraries, and why the order of objects and libraries on the link line matters:

https://groups.google.com/g/gnu.gcc.help/c/muvgXVAU6l0/m/fVpqbXYp7cEJ
https://eli.thegreenplace.net/2013/07/09/library-order-in-static-linking/

2 Likes

Other posts likely explain the concept better than mine, but Symbol processing | MaskRay has pseudo-code which may be more accurate.

#relocatable-object-file-suppressing-archive-member-extraction explains the shadowing behavior. Not reporting duplicate symbol error has some values. As the chapter shows, users can supply optimized routines to override some libc functions (e.g. malloc, mem*, str*).

I agree not reporting an error may lead to brittle builds. It’s however quite difficult to improve strictness while allowing benign/intended usage (they don’t want to do more object file format tricks just to appease an ODR checker).

If you use Clang modules, it has a built-in ODRHash support, which can detect a number of brittle build problems (see clang/test/Modules/odr_hash.cpp). The limitation is that it’s for a single TU.

printf > ./a.cc %s '
#include "first.h"
#include "second.h"
S1 s1;
'
echo 'struct S1 {};' > ./first.h
printf > ./module.map %s '
module First { header "first.h" }
module Second { header "second.h" }
'
echo 'struct S1 { private: };' > ./second.h
% clang -fsyntax-only -fmodules -I. a.cc
In module 'First' imported from a.cc:2:
./first.h:1:12: error: 'S1' has different definitions in different modules; first difference is definition in module 'First' found end of class
struct S1 {};
           ^
./second.h:1:13: note: but in 'Second' found private access specifier
struct S1 { private: };
            ^~~~~~~~
1 error generated.
1 Like

Thank you all! Ok, I see, there are historical reasons for the behavior. To some extent, it is reasonable. In the end, developers should avoid such bugs by themselves.

I will try to catch these odr-violation bugs through the dedicated tools, or the solution provided by MaskRay.