Relocatable file definitions shared with DSO

tl;dr Do we want a linker option to catch certain multiple definition issues involving DSO that are less likely benign?

Commit 1981b1b6b92f7579a30c9ed32dbdf3bc749c1b40 unexpectedly made --no-allow-shlib-undefined stronger. When all the three conditions are satisfied, the new --no-allow-shlib-undefined code reports an error.

  • There is a DSO undef that can be satisfied by a SharedSymbol.
  • The SharedSymbol is overridden by a hidden visibility Defined.
  • The hidden visibility Defined can be garbage-collected (it is not part of .dynsym and is not marked as live).

You may read https://github.com/llvm/llvm-project/blob/fa19ef7a10869bf0f8325681be111f7d97b2544e/lld/test/ELF/allow-shlib-undefined.s#L46 to get a sense of the error checking.

However, I am considering restoring the previous behavior (suppressing this error) in #70130.

That said, commit 1981b1b6b92f7579a30c9ed32dbdf3bc749c1b40 does catch some brittle builds.

For example,

% ld.lld @response.txt -y _Znam
...
libfdio.so: reference to _Znam
libclang_rt.asan.so: shared definition of _Znam
libc++.a(stdlib_new_delete.cpp.obj): definition of _Znam

The executable contains a definition from libc++.a, while at run-time, libfdio.soā€™s _Znam reference gets resolved to libclang_rt.asan.so. libclang_rt.asan.so exports _Znam, so when it is linked, the definition is either default or protected visibility, definitely different from the hidden visibility libc++.a.

This scenario implies two different _Znam implementations, which is less likely benign ODR violation. A possible fix is to switch to an asan-instrumented libc++.a that does not define _Znam.

I have also seen problems due to mixing multiple definitions from libgcc.a (hidden visibility) and libclang_rt.builtins.a (default visibility) and relying on archive member extraction rules to work.

Some groups compile relocatable object files with -fvisibility=hidden to allow just static linking. Nevertheless, their system consists of certain shared objects and there is a probability of messing up multiple definition symbols.


  • Does it make sense to add an option to catch the brittle case? If yes, lld allow hidden symbols shared with dso by MaskRay Ā· Pull Request #70163 Ā· llvm/llvm-project Ā· GitHub provides the implementation
  • Disable the check by default?
  • Does it make sense to extend the check to protected visibility?
  • Does it make sense to extend the check to default visibility? This would allow us to know preempted symbols for link-time shared objects. However, I suspect that there are a lot of benign violations and in the absence of an ignorelist mechanism, this will not be useful.

If the option covers more than one visibility, it seems challenging to pick a precise option name.
One candidate is --allowed-visibility-for-symbols-shared-with-dso=[none,all,default,default+protected,default+protected+hidden]


If the textual description is difficult to understand, the following test demonstrates the proposed error checking:

# REQUIRES: x86
# RUN: rm -rf %t && split-file %s %t && cd %t
# RUN: llvm-mc -filetype=obj -triple=x86_64 a.s -o a.o
# RUN: llvm-mc -filetype=obj -triple=x86_64 weak.s -o weak.o

# RUN: llvm-mc -filetype=obj -triple=x86_64 def.s -o def.o
# RUN: ld.lld -shared -soname=def.so def.o -o def.so
# RUN: ld.lld a.o def.so -o /dev/null
# RUN: ld.lld a.o def.so -o /dev/null --no-allow-hidden-symbols-shared-with-dso --allow-hidden-symbols-shared-with-dso

# RUN: not ld.lld a.o def.so --no-allow-hidden-symbols-shared-with-dso 2>&1 | FileCheck %s --check-prefix=CHECK1
# RUN: not ld.lld --gc-sections weak.o def.so a.o --no-allow-hidden-symbols-shared-with-dso 2>&1 | FileCheck %s --check-prefix=CHECK1

# CHECK1:      error: hidden symbol also defined by DSO: foo
# CHECK1-NEXT: >>> defined by {{.*}}a.o

# RUN: llvm-mc -filetype=obj -triple=x86_64 ref.s -o ref.o
# RUN: ld.lld -shared -soname=ref.so ref.o -o ref.so
# RUN: not ld.lld a.o ref.so --no-allow-hidden-symbols-shared-with-dso 2>&1 | FileCheck %s --check-prefix=CHECK2

# CHECK2:      error: hidden symbol also referenced by DSO: foo
# CHECK2-NEXT: >>> defined by {{.*}}a.o

## See also allow-shlib-undefined.s.

#--- a.s
.globl _start, foo
_start:

## The check kicks in even if .text.foo is discarded.
.section .text.foo,"ax"
.hidden foo
foo:

#--- weak.s
.weak foo
foo:

#--- def.s
.globl foo
foo:

#--- ref.s
call foo

Just to make sure I understand, I think there are two separate problems that may not cleanly overlap with the options we have now, or proposed. Iā€™m starting with the assumption that a dynamic linker wonā€™t see any hidden visibility definitions.

The first is the behavior of --no-allow-shlib-undefined in the presence of hidden visibility definitions. For this I think we want to give an error message when a dynamic linker would give an error message. I think this means that:

  • If there is no default visibility symbol for a shared library reference from other shared-libraries or the executable then linker should give an error message.
  • If there is a default visibility definition in another shared-library we shouldnā€™t give an error message for --no-allow-shlib-undefined if that symbol is preempted by a hidden visibility symbol (see 2nd problem).
  • This should be independent of garbage collection.

The second problem is when a hidden visibility symbol in the executable preempts a shared definition. This will mean that the shared libraries and executable see different definitions which is an ODR violation that may be benign. In this case garbage collection (or at least the symbol liveness) does matter, because if the preempting definition is not used there is no ODR violation.

My intuition is that this should apply to non-default visibility symbols (protected, hidden and internal). I donā€™t think that fine grain control will be needed. Perhaps --no-allow-shlib-odr-violation is a better name?

I think starting with the check disabled is the easiest way to introduce this. A lot of projects need to link with GNU ld and LLD so unless GNU ld adopts a similarly named option then disabling it could be difficult for some projects.

Extending the check to default is potentially interesting, although if Iā€™ve understood correctly, that will just say which symbols have been preempted. That is probably a separate ā€œremarkā€ level diagnostic.

I think the fundamental issue here is that no-allow-shlib-undefined diagnostic (both before and after the recent change) is working on the basis of the static symbol resolution of the current link. Itā€™s understandable that it does so, since thatā€™s what the linker is typically working on, but itā€™s not actually correct to use that for diagnosing symbol-lookup issues in shared libraries youā€™re linking against.

E.g. you get false-negatives if a shared-lib actually does require a symbol defined by the binary (no other shared lib defines it), but the symbol in the binary was marked hidden.

Thanks for the comments!

Yes. This is the desired behavior of --no-allow-shlib-undefined.

The current [ELF] Suppress --no-allow-shlib-undefined diagnostic when a SharedSymbol is overridden by a hidden visibility Defined which is later discarded by MaskRay Ā· Pull Request #70130 Ā· llvm/llvm-project Ā· GitHub checks the ODR violation before if (!isLive) demoteDefined.
Therefore, the behavior is independent of garbage collection.

 static void demoteSymbolsAndComputeIsPreemptible() {
   llvm::TimeTraceScope timeScope("Demote symbols");
   DenseMap<InputFile *, DenseMap<SectionBase *, size_t>> sectionIndexMap;
+  bool allowHiddenSymbolsSharedWithDso =
+      config->allowHiddenSymbolsSharedWithDso;
   for (Symbol *sym : symtab.getSymbols()) {
     if (auto *d = dyn_cast<Defined>(sym)) {
+      if (!allowHiddenSymbolsSharedWithDso &&
+          d->visibility() == ELF::STV_HIDDEN) {
+        auto flags = d->flags.load(std::memory_order_relaxed);
+        if (flags & HAS_SHARED_DEF) {
+          errorOrWarn("hidden symbol also defined by DSO: " + toString(*sym) +
+                      "\n>>> defined by " + toString(d->file));
+        } else if (flags & HAS_SHARED_REF) {
+          errorOrWarn("hidden symbol also referenced by DSO: " +
+                      toString(*sym) + "\n>>> defined by " + toString(d->file));
+        }
+      }
+
       if (d->section && !d->section->isLive())
         demoteDefined(*d, sectionIndexMap[d->file]);

ā€œgarbage collection (or at least the symbol liveness) does matterā€ is an interesting point.
With this interpretation, the diff needs to be adjusted to run if (!isLive) demoteDefined before the ODR violation check.

If a hidden visibility Defined preempts a default visibility SharedSymbol, and the Defined survives under --gc-sections, since the Defined will not be exported, we are sure that there is an ODR violation.

If a protected visibility Defined preempts a default visibility SharedSymbol, we are uncertain whether the DSO symbol will be resolved to the executable at run-time.
(Note: the Defined will satisfy includeInDynsym() as its exportDynamic is true)
It seems that we cannot report an error if we want to be conservative.

If a Defined preempts a protected visibility SharedSymbol, since the Defined will not be exported, we are sure that there is an ODR violation.
(A default visibility SharedSymbol linked with -Bsymbolic/ā€“dynamic-list-family options are similar, but we cannot tell whether a DSO uses --dynamic-list and certain -Bsymbolic variants.)

Perhaps just checking hidden/internal visibility is good enough?

In practice, we can treat STV_INTERNAL the same as STV_HIDDEN.
SGI introduced STV_INTERNAL for their link-time interprocedural optimization.
They are more restricted than STV_HIDDEN in that they ā€œare not accessed outside of the moduleā€ (I suspect that here ā€œmoduleā€ refers to ā€œcompile unitā€).
It is not used by GNU, HP, or Solaris.

Agree. I plan to raise a feature request when we have reached an initial consensus on the desired semantics, if we decide to introduce a new option.

Yes, extending the check to default visibility will catch link-time symbol interposition, which is (if users correctly respect ABI for link-time and run-time DSOs) a subset of run-time symbol interposition.
There will probably be a lot of deploy problems due to vague linkage functions (especially if -fvisibility-inlines-hidden is not used) and certain runtime libraries.
Checking the default visibility is probably a stretch goal that does not have a large return-on-investment.

Perhaps just checking hidden/internal visibility is good enough?

Yes, agreed, good point about protected symbols being visible.

ā€œgarbage collection (or at least the symbol liveness) does matterā€ is an interesting point.
With this interpretation, the diff needs to be adjusted to run if (!isLive) demoteDefined before the ODR violation check.

I think if we donā€™t consider liveness then we could get a false positive (warning or error). This might be rare enough that no-one is affected, or those that are can remove the dead code. Although there will always be those with binary libraries, or source code they canā€™t easily modify that wonā€™t be able to do this. Not got a strong opinion here.

However, I am considering restoring the previous behavior (suppressing this error) in #70130.

I have created a patch ([ELF] Enhance --no-allow-shlib-undefined to report non-exported definition by MaskRay Ā· Pull Request #70769 Ā· llvm/llvm-project Ā· GitHub) on top of #70130 (loose the check) that improves the check to cover the cases when the relocatable object file definition is non-exported (hidden/internal visibility or localized by a version script).

GNU ld has detected these cases since April 2003 (discussion)

# RUN: not ld.lld %t.o %t.so %tdef-hidden.o -o /dev/null 2>&1 | FileCheck %s --check-prefix=HIDDEN

# HIDDEN:      error: hidden symbol '_unresolved' is referenced by DSO '{{.*}}.tmp.so'
% ld.bfd a.o a.so adef-hidden.o --allow-shlib-undefined -y _unresolved  # this error-checking is not governed by --allow-shlib-undefined in GNU ld
ld.bfd: a.so: reference to _unresolved
ld.bfd: adef-hidden.o: definition of _unresolved
ld.bfd: a.out: hidden symbol `_unresolved' in adef-hidden.o is referenced by DSO
ld.bfd: final link failed: bad value
% fld.lld a.o a.so adef-hidden.o
ld.lld: error: hidden symbol '_unresolved' is referenced by DSO 'a.so'

The new logic will be like

      for (SharedFile *file : ctx.sharedFiles) {
        bool allNeededIsKnown =
            llvm::all_of(file->dtNeeded, [&](StringRef needed) {
              return symtab.soNames.count(CachedHashStringRef(needed));
            });
        if (!allNeededIsKnown)
          continue;
        for (Symbol *sym : file->requiredSymbols) {
          if (sym->hasFlag(HAS_SHARED_DEF))
            continue;
          if (sym->isUndefined() && !sym->isWeak()) {
            diagnose(
                "undefined reference due to --no-allow-shlib-undefined: " +
                toString(*sym) + "\n>>> referenced by " + toString(file));
          } else if (sym->isDefined() && sym->computeBinding() == STB_LOCAL ) {
            diagnose("non-exported symbol '" + toString(*sym) +
                     "' is referenced by DSO '" + toString(file) + "'");
          }
        }
      }

Internally, I have identified 2 bugs (which would fail with LD_BIND_NOW=1) covered by commit 1981b1b6b92f7579a30c9ed32dbdf3bc749c1b40 (for the discarded section cases).
The above patch may identify more bugs (linker-reachable definitions that are not exercised at runtime; made benign by lazy binding).


I have made a summary: DSO undef and non-exported definition | MaskRay

To check I understand as I got a bit confused with the three pull requests:

All three pull requests contain #a350e004480dc40f25a18186657b266df5cde584 which will prevent lld from giving an error when there is another shared-definition, but a garbage collected local definition masks it.

[ELF] Enhance --no-allow-shlib-undefined to report non-exported definition by MaskRay Ā· Pull Request #70769 Ā· llvm/llvm-project Ā· GitHub also contains a follow up change that checks whether the local definition is actually exported into the dynamic symbol and, like GNU ld, gives an error message for this case.

[ELF] Add --[no-]allow-non-exported-symbols-shared-with-dso by MaskRay Ā· Pull Request #70163 Ā· llvm/llvm-project Ā· GitHub contains a different follow up change that adds --[no-]allow-non-exported-symbols-shared-with-dso which gives an error message if lld detects a non-exported definition in the executable and a separate shared def of the same symbol, which is a, potentially benign, ODR violation.

Do you want me to concentrate on [ELF] Suppress --no-allow-shlib-undefined diagnostic when a SharedSymbol is overridden by a hidden visibility Defined which is later discarded by MaskRay Ā· Pull Request #70130 Ā· llvm/llvm-project Ā· GitHub first? Then both [ELF] Add --[no-]allow-non-exported-symbols-shared-with-dso by MaskRay Ā· Pull Request #70163 Ā· llvm/llvm-project Ā· GitHub and [ELF] Enhance --no-allow-shlib-undefined to report non-exported definition by MaskRay Ā· Pull Request #70769 Ā· llvm/llvm-project Ā· GitHub can be handled separately.

FWIW I think #70163 looks like the right semanitcs for --no-allow-shlib-undefined. Iā€™m on the fence as to whether --[no-]allow-non-exported-symbols-shared-with-dso will ever get used.

#70130 ([ELF] Suppress --no-allow-shlib-undefined diagnostic ...) is to remove an ODR violation check from --no-allow-shlib-undefined.

The lost check can be restored with #70163 ([ELF] Add --[no-]allow-non-exported-symbols-shared-with-dso), which also covers more case (even if the non-exported definitions survives GC). The issue is that if --allow-... is set to the default, the check will probably not be used. Therefore, we likely need to make --no-allow-... the default.

Personally, I wish that we wouldnā€™t add another option, but some people might argue that the ODR violation detected by the current --no-allow-shlib-undefined code is valid use case, and the check doesnā€™t fit into --no-allow-shlib-undefined, which is to simulate rtld behaviors.

For my users, I have fixed almost all issues and I can live without a turn-off option.


#70769 ([ELF] Enhance --no-allow-shlib-undefined to report non-exported definition) is an orthogonal patch to #70130 and #70163.


Based on the current information I have, I donā€™t require any of the three patches for my users. Please take your time, and donā€™t feel pressured. Thank you very much for reviewing the changes.

#70163, which adds a new option, is particularly in a no-hurry mode, as I should communicate the idea to the binutils community, but I want the idea to gain some initial consensus within the llvm-project.

@smeenai @igorkudrin @ruiu

I donā€™t think any of my users will end up needing the options either, because most of what we build is shared libraries, so --no-allow-shlib-undefined doesnā€™t come into play at all. (This is for Android, so all your native code is built as shared library thatā€™s loaded from Java or Kotlin, and we donā€™t have any standalone native executables to speak of.) That being said, Iā€™d welcome anything that can help detect potential ODR issues, because I donā€™t think we have any tooling to detect cross-library ODR violations at all right now.

Landed [ELF] Suppress --no-allow-shlib-undefined diagnostic when a SharedSymbol is overridden by a hidden visibility Defined which is later discarded by MaskRay Ā· Pull Request #70130 Ā· llvm/llvm-project Ā· GitHub, which restores the previous (behavior before release/18.x is created), suppressing a diagnostic when:

  • There is a DSO undef that has been satisfied by a definition from another DSO.
  • The SharedSymbol is overridden by a non-exported (usually of hidden visibility) definition in a relocatable object file (Defined).
  • The section containing the Defined is garbage-collected (it is not part of .dynsym and is not marked as live).

My latest thought is that we probably should not do #70163. This applies to too few scenarios. A generic ODR violation detector would be more useful.

1 Like