ld.lld "Don't let __start_/__stop_ retain C identifier name sections" && Swift

tl;dr With --gc-sections, I think the rule "__start_foo/__stop_foo references from live sections retains all non-SHF_LINK_ORDER input sections foo" does not cary its weight, so I'd like to drop it entirely in ⚙ D96914 [ELF] Add -z start-stop-gc to let __start_/__stop_ not retain C identifier name sections

I have done a large-scale internal test with huge amount of OSS usage and spotted two issues:

(1) Linking systemd. systemd/bus-error.h at main · systemd/systemd · GitHub there will be an `undefined symbol: __start_SYSTEMD_BUS_ERROR_MAP` error. Supposedly it can be trivially fixed by using undefined weak symbols on __start_/__stop_.
(2) Linking Swift. There will be errors like `undefined hidden symbol: __start_swift5_protocols`.
    swift/SwiftRT-ELF.cpp at main · apple/swift · GitHub
    It seems that trivially making `extern const char __start_##name` does not work.
    The code relies on some `swift5_*` input sections being GC root.
    (If someone can file an issue to Swift, I'd appreciate that.)
    (If Swift folks can fix it, I'll give my big thanks:)

This can still potentially break some propritary code so I am sending this heads-up.
I'll place rationale below (it is complicated).

The current rule is:

   __start_/__stop_ references retains all non-SHF_LINK_ORDER C identifier name sections.

After ⚙ D96753 [lld][ELF] __start_/__stop_ refs don't retain C-ident named group sections , it will become

   __start_/__stop_ references retains all non-SHF_LINK_ORDER non-SHF_GROUP C identifier name sections.

(The section group special case is to allow garbage collecting __llvm_prf_* sections for -fprofile-generate/-fprofile-instr-generate. The saving is huge.)

Personally I'd drop the rule entirely (D96914) (get support from jhenderson and phosek), i.e.

   __start_/__stop_ references do not retain C identifier name sections.

and hope folks can fix Swift/systemd to not rely on the original rule.

We rely on this rule in various places (e.g. see the "Merging JNI_OnLoad" section of https://engineering.fb.com/2018/01/23/android/android-native-library-merging/), so thank you for the heads up.

SHF_GNU_RETAIN looks like a good solution for our use cases, and is something I've wanted for ELF for a long time. If I'm understanding the current setup correctly though, we would need to apply it using a .section attribute in inline assembly, which is inconvenient. For Mach-O (and I believe for COFF as well), __attribute__((used)) instructs the linker to not garbage collect the symbol. Would we consider translating __attribute__((used)) to SHF_GNU_RETAIN for ELF as well and thereby get consistent linker behavior across platforms? That should work great for our use cases, and it'd be much cleaner and more explicit than the current linker special-casing.

We rely on this rule in various places (e.g. see the "Merging JNI_OnLoad" section of Android native library merging - Engineering at Meta), so thank you for the heads up.

Thanks. Do you mind testing the patch thorough internally? :slight_smile: It would
be good to know whether we can flip the default and whether we need an option.

Once the projects drop potential reliance on the old behavior, the new
behavior can actually be helpful as previously GC is conservatively suppressed.

SHF_GNU_RETAIN looks like a good solution for our use cases, and is something I've wanted for ELF for a long time. If I'm understanding the current setup correctly though, we would need to apply it using a .section attribute in inline assembly, which is inconvenient. For Mach-O (and I believe for COFF as well), __attribute__((used)) instructs the linker to not garbage collect the symbol. Would we consider translating __attribute__((used)) to SHF_GNU_RETAIN for ELF as well and thereby get consistent linker behavior across platforms? That should work great for our use cases, and it'd be much cleaner and more explicit than the current linker special-casing.

In the latest iteration, GCC folks think 'retain' and 'used' should be different: [PATCH v6] Add retain attribute to place symbols in SHF_GNU_RETAIN section

One point is that 'used' can be used to enable references from inline assembly, but the code may still want GC on the 'used' section.
Combining two features into one disallows this use case.

Can you elaborate how 'used' has GC root semantics on PE-COFF and Mach-O?

I have a patch to enable
!retain ⚙ D96837 Add !retain metadata to retain global values under linker garbage collection
and SHF_GNU_RETAIN on clang side: ⚙ D96838 Add GNU attribute 'retain' (I'll need to rework the patch to use 'retain')

I can test it for the parts I work on (and ask other teams to do to the same), but it'll take some time to get the required infrastructure set up, and I also have other commitments to get to first. What timeline did you have in mind for landing this change?

For Mach-O, __attribute__((used)) sets the N_NO_DEAD_STRIP flag on the symbol, which the linker's dead stripping takes into account. For COFF, __attribute__((used)) emits a drectve section (linker directives) containing /INCLUDE arguments for all symbols that are __attribute__((used)), which is equivalent to -u on ELF (and therefore turns those symbols into GC roots).

    >We rely on this rule in various places (e.g. see the "Merging JNI_OnLoad" section of https://engineering.fb.com/2018/01/23/android/android-native-library-merging/), so thank you for the heads up.

    Thanks. Do you mind testing the patch thorough internally? :slight_smile: It would
    be good to know whether we can flip the default and whether we need an option.

    Once the projects drop potential reliance on the old behavior, the new
    behavior can actually be helpful as previously GC is conservatively suppressed.

    >SHF_GNU_RETAIN looks like a good solution for our use cases, and is something I've wanted for ELF for a long time. If I'm understanding the current setup correctly though, we would need to apply it using a .section attribute in inline assembly, which is inconvenient. For Mach-O (and I believe for COFF as well), __attribute__((used)) instructs the linker to not garbage collect the symbol. Would we consider translating __attribute__((used)) to SHF_GNU_RETAIN for ELF as well and thereby get consistent linker behavior across platforms? That should work great for our use cases, and it'd be much cleaner and more explicit than the current linker special-casing.

    In the latest iteration, GCC folks think 'retain' and 'used' should be different: https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565478.html

    One point is that 'used' can be used to enable references from inline assembly, but the code may still want GC on the 'used' section.
    Combining two features into one disallows this use case.

    Can you elaborate how 'used' has GC root semantics on PE-COFF and Mach-O?

    I have a patch to enable
    !retain https://reviews.llvm.org/D96837
    and SHF_GNU_RETAIN on clang side: https://reviews.llvm.org/D96838 (I'll need to rework the patch to use 'retain')

I can test it for the parts I work on (and ask other teams to do to the same), but it'll take some time to get the required infrastructure set up, and I also have other commitments to get to first. What timeline did you have in mind for landing this change?

No hurry:) This is not urgent. We also should test the Linux kernel so I've created ld.lld -z start-stop-gc (GC of C identifier name sections) · Issue #1307 · ClangBuiltLinux/linux · GitHub

For Mach-O, __attribute__((used)) sets the N_NO_DEAD_STRIP flag on the symbol, which the linker's dead stripping takes into account. For COFF, __attribute__((used)) emits a drectve section (linker directives) containing /INCLUDE arguments for all symbols that are __attribute__((used)), which is equivalent to -u on ELF (and therefore turns those symbols into GC roots).

Thanks for the information!

The PE-COFF approach (/INCLUDE:sym) makes the whole section retained. For ELF,
GCC __attribute__((retain)) creates a separate section (even in
-fno-data-sections -fno-function-sections mode) so the monolithic section can
still be garbage collected. This made a Linux kernel use case unhappy because it
somehow assumes one section.

I’ve filed an internal issue tracker for us to investigate the impact of this proposal, although I don’t know when we’ll get a chance to schedule the work at this point. Also, it’s worth noting that we can’t test all downstream codebases that potentially could use this feature, so we’ll likely want some potential source code or switch that will allow our users to keep their unused sections.

What this proposes is really at the very edge of my understanding of ELF sections, but I have a side project that makes me think the “drop the rule entirely (D96914)” part will be a problem for people.

My side project is to enhance the googletest infrastructure to detect un-executed test assertions. When using Clang as the build compiler, my tactics depend on __start/__stop references to C identifier name sections retaining everything in those sections. The data allocated to the section does not define any globals so there are no other GC roots. (I could almost get the same tactic to work with GCC as the build compiler, but there is one GCC quirk related to inline functions that got in the way, so I do something more complicated and ugly there.)

In researching how to make this work, it appears that depending on this behavior of __start/__stop is a not-uncommon tactic; it is fairly well known to work with GNU linkers and LLD. In effect you can allocate static data elements to the section at arbitrary points, and the __start/__stop symbols let you treat the entire thing as an array. It is impractical to generate unique global symbols for the data elements, and even if you do, it is not possible to generate references to those global symbols from elsewhere. And in general, you do not want the static elements to be GC’d; it defeats the purpose of allocating them in the first place. There’s no use of SHF_LINK_ORDER or SHF_GROUP here; these are normal static variables allocated to a custom section. In my case, I can’t depend on the order of elements anyway, and macro invocations can’t tell whether they’re invoked inside templates so I can’t use SHF_GROUP either. I end up sorting and deduplicating data manually when it’s time to look at everything.

I see the idea for adding a new Clang attribute to “retain” something, but mainly what that does is create work for anyone depending on the historical behavior; we have to conditionalize the set of attributes based on whether Clang understands “retain” and then cross our fingers hoping we don’t end up in a situation with a pre-retain Clang and a post-retain LLD, because that will break everything.

I hope this is clear enough, let me know if my explanation doesn’t make any sense.

Thanks,

–paulr

Ah, it's interesting that __attribute__((retain)) always creates a separate section. For COFF though, note that Microsoft's compiler turns on /Gy (equivalent to -ffunction-sections) wherever you build with optimizations, so that's super common.

    >I can test it for the parts I work on (and ask other teams to do to the same), but it'll take some time to get the required infrastructure set up, and I also have other commitments to get to first. What timeline did you have in mind for landing this change?

    No hurry:) This is not urgent. We also should test the Linux kernel so I've created https://github.com/ClangBuiltLinux/linux/issues/1307

    >For Mach-O, __attribute__((used)) sets the N_NO_DEAD_STRIP flag on the symbol, which the linker's dead stripping takes into account. For COFF, __attribute__((used)) emits a drectve section (linker directives) containing /INCLUDE arguments for all symbols that are __attribute__((used)), which is equivalent to -u on ELF (and therefore turns those symbols into GC roots).

    Thanks for the information!

    The PE-COFF approach (/INCLUDE:sym) makes the whole section retained. For ELF,
    GCC __attribute__((retain)) creates a separate section (even in
    -fno-data-sections -fno-function-sections mode) so the monolithic section can
    still be garbage collected. This made a Linux kernel use case unhappy because it
    somehow assumes one section.

What this proposes is really at the very edge of my understanding of ELF sections, but I have a side project that makes me think the “drop the rule entirely (D96914)” part will be a problem for people.

My side project is to enhance the googletest infrastructure to detect un-executed test assertions. When using Clang as the build compiler, my tactics depend on __start/__stop references to C identifier name sections retaining everything in those sections. The data allocated to the section does not define any globals so there are no other GC roots. (I could almost get the same tactic to work with GCC as the build compiler, but there is one GCC quirk related to inline functions that got in the way, so I do something more complicated and ugly there.)

In researching how to make this work, it appears that depending on this behavior of __start/__stop is a not-uncommon tactic; it is fairly well known to work with GNU linkers and LLD. In effect you can allocate static data elements to the section at arbitrary points, and the __start/__stop symbols let you treat the entire thing as an array. It is impractical to generate unique global symbols for the data elements, and even if you do, it is not possible to generate references to those global symbols from elsewhere. And in general, you do not want the static elements to be GC’d; it defeats the purpose of allocating them in the first place. There’s no use of SHF_LINK_ORDER or SHF_GROUP here; these are normal static variables allocated to a custom section. In my case, I can’t depend on the order of elements anyway, and macro invocations can’t tell whether they’re invoked inside templates so I can’t use SHF_GROUP either. I end up sorting and deduplicating data manually when it’s time to look at everything.

I see the idea for adding a new Clang attribute to “retain” something, but mainly what that does is create work for anyone depending on the historical behavior; we have to conditionalize the set of attributes based on whether Clang understands “retain” and then cross our fingers hoping we don’t end up in a situation with a pre-retain Clang and a post-retain LLD, because that will break everything.

I hope this is clear enough, let me know if my explanation doesn’t make any sense.

Thanks,

–paulr

On https://reviews.llvm.org/D96838#2585171

Aha; attribute used by itself is not sufficient to preserve sections in the output. But the __start_/__stop_ symbols implicitly create a reference to each of the named sections, and that implicit reference can preserve them in the output (assuming gc roots etc). So, the idea is that attribute retain can be used instead of the __start_/__stop_ symbols, to preserve sections in the output (with the advantage that it will work even for sections that do not have a C-identifier name).

Thanks for helping me understand this from a user perspective. That will be important when you go to write the release note for this new attribute.

I dug up the history a bit. gold had this behavior in 2010.
GNU ld got a workaround in 2010 partly because glibc refused to fix the issue (facepalm) https://sourceware.org/bugzilla/show_bug.cgi?id=3400
Before 2015, the GNU ld behavior only applied to sections in the same .o of the _start/_stop references. In 2015-10, the behavior finally applied to other .o files.

2015-10 is relatively new, so I don’t think there are many applications depending on the behavior.
But there are indeed some applications.

I submitted a GNU ld patch for -z start-stop-gc which has been accepted by Alan Modra (https://sourceware.org/bugzilla/show_bug.cgi?id=27451).
I think at some point (14.0.0?) we can still switch the default.