__attribute__((retain)) && llvm.used/llvm.compiler.used

Currently __attribute__((used)) lowers to llvm.used.

* On Mach-O, a GlobalObject in llvm.used gets the S_ATTR_NO_DEAD_STRIP
attribute, which prevents linker GC (dead stripping).
* On COFF, a non-local-linkage GlobalObject[1] in llvm.used gets the
/INCLUDE: linker option (similar to ELF `ld -u`), which prevents
linker GC.
  It should be possible to work with local linkage GlobalObject's as
well but that will require a complex COMDAT dance.
* On ELF, a global object llvm.used can be discarded by
ld.bfd/gold/ld.lld --gc-sections.
  (If the section is a C identifier name, __start_/__stop_ relocations
from a live input section can retain the section, even if its defined
symbols are not referenced. [2] .
  I understand that some folks use `__attribute__((used,
section("C_ident")))` and expect the sections to be similar to GC
roots, however,
  non-C-identifier cases are very common, too. They don't get
__start_/__stop_ linker magic and the sections can always be GCed.
  )

In LangRef, the description of llvm.used contains:

If a symbol appears in the @llvm.used list, then the compiler, assembler, and **linker** are required to treat the symbol as if there is a reference to the symbol that it cannot see (which is why they have to be named). For example, if a variable has internal linkage and no references other than that from the @llvm.used list, it cannot be deleted. This is commonly used to represent references from inline asms and other things the compiler cannot “see”, and corresponds to “attribute((used))” in GNU C.

Note that the "linker" part does not match the reality on ELF targets.
It does match the reality on Mach-O and partially on COFF.

llvm.compiler.used:

The @llvm.compiler.used directive is the same as the @llvm.used directive, except that it only prevents the compiler from touching the symbol. On targets that support it, this allows an **intelligent linker to optimize references to the symbol without being impeded** as it would be by @llvm.used.

Note that this explicitly mentions linker GC, so this appears to be
the closest thing to __attribute__((used)) on ELF.
However, LangRef also says:

This is a rare construct that should only be used in rare circumstances, and should not be exposed to source languages.

My goal is to implement __attribute__((retain)) (which will be in GCC
11) on ELF. GCC folks think that 'used' and 'retain are orthogonal.
(see ⚙ D96838 Add GNU attribute 'retain')

Shall we

1. Lift the source language restriction on llvm.compiler.used and
change __attribute__((used)) to use llvm.compiler.used on ELF.
2. Or add a metadata (like ⚙ D96837 Add !retain metadata to retain global values under linker garbage collection)?

I lean to option 1 to leverage the existing mechanism.
The downside is that clang codegen will have some target inconsistency
(llvm.compiler.used on ELF while llvm.used on others).

[1]: The implementation additionally allows GlobalAlias.
[2]: See Metadata sections, COMDAT and SHF_LINK_ORDER | MaskRay
"C identifier name sections" for details.

Currently __attribute__((used)) lowers to llvm.used.

* On Mach-O, a GlobalObject in llvm.used gets the S_ATTR_NO_DEAD_STRIP
attribute, which prevents linker GC (dead stripping).
* On COFF, a non-local-linkage GlobalObject[1] in llvm.used gets the
/INCLUDE: linker option (similar to ELF `ld -u`), which prevents
linker GC.
It should be possible to work with local linkage GlobalObject's as
well but that will require a complex COMDAT dance.
* On ELF, a global object llvm.used can be discarded by
ld.bfd/gold/ld.lld --gc-sections.
(If the section is a C identifier name, __start_/__stop_ relocations
from a live input section can retain the section, even if its defined
symbols are not referenced. [2] .
I understand that some folks use `__attribute__((used,
section("C_ident")))` and expect the sections to be similar to GC
roots, however,
non-C-identifier cases are very common, too. They don't get
__start_/__stop_ linker magic and the sections can always be GCed.
)

In LangRef, the description of llvm.used contains:

If a symbol appears in the @llvm.used list, then the compiler, assembler, and **linker** are required to treat the symbol as if there is a reference to the symbol that it cannot see (which is why they have to be named). For example, if a variable has internal linkage and no references other than that from the @llvm.used list, it cannot be deleted. This is commonly used to represent references from inline asms and other things the compiler cannot “see”, and corresponds to “attribute((used))” in GNU C.

Note that the "linker" part does not match the reality on ELF targets.
It does match the reality on Mach-O and partially on COFF.

llvm.compiler.used:

The @llvm.compiler.used directive is the same as the @llvm.used directive, except that it only prevents the compiler from touching the symbol. On targets that support it, this allows an **intelligent linker to optimize references to the symbol without being impeded** as it would be by @llvm.used.

Note that this explicitly mentions linker GC, so this appears to be
the closest thing to __attribute__((used)) on ELF.
However, LangRef also says:

This is a rare construct that should only be used in rare circumstances, and should not be exposed to source languages.

My goal is to implement __attribute__((retain)) (which will be in GCC
11) on ELF. GCC folks think that 'used' and 'retain are orthogonal.
(see ⚙ D96838 Add GNU attribute 'retain')

Shall we

1. Lift the source language restriction on llvm.compiler.used and
change __attribute__((used)) to use llvm.compiler.used on ELF.

It is too late here and I did not think of it clearly;-)

Clarify:

1. Lift the source language restriction on llvm.compiler.used, let llvm.used use SHF_GNU_RETAIN on ELF, and change __attribute__((used)) to use llvm.compiler.used on ELF.

__attribute__((retain)) has semantics which are not described by
llvm.used/llvm.compiler.used. To facilitate linker GC, __attribute__((retain))
causes the section to be placed in a unique section. The separate section
behavior can be undesired in some cases (e.g. poorly written Linux kernel linker
scripts which expect one section per name).

So in the -fno-function-sections -fno-data-sections case, a retained
function/variable does not cause the whole .text/.data/.rodata to be retained.

The test llvm/test/CodeGen/X86/elf-retain.ll in ⚙ D96837 Add !retain metadata to retain global values under linker garbage collection
demonstrates the behavior. So I am not particularly clear that we should use
llvm.compiler.used/llvm.used to describe __attribute__((retain)) .

(best to include folks from previous conversations in threads - sometimes we can’t all keep up to date with all the threads happening - so I’ve added John McCall here, and echristo since he might have some thoughts on this too)

I’d lean towards (1) too myself - give the LLVM constructs consistent semantics, and deal with the platform differences in the frontend during the mapping down to LLVM.

(best to include folks from previous conversations in threads - sometimes we can't all keep up to date with all the threads happening - so I've added John McCall here, and echristo since he might have some thoughts on this too)

I'd lean towards (1) too myself - give the LLVM constructs consistent semantics, and deal with the platform differences in the frontend during the mapping down to LLVM.

I chatted with Saleem Abdulrasool, who is in favor of (1), too.

I am going to send these patches:

(a) Add CodeGenModule::addUsedOrCompilerUsedGlobal (which uses
llvm.compiler.used for ELF and llvm.used for the others). Migrate some
addUsedGlobal call sites to use addUsedOrCompilerUsedGlobal.
(b) Add __attribute__((retain))
(c) Change llvm.used to use SHF_GNU_RETAIN if integrated assembler or

=2.36

Currently llvm.used/llvm.compiler.used should have no difference on
ELF, so (a) & (b) do not affect users who don't use 'retain'.

(c) will change the binary format representation of llvm.used, so
there is some risk if the consumer is not prepared for multiple
sections of the same name (which means they already break with
-fno-unique-section-names, but the option is rare).
On very large C/C++ projects, llvm.used has usually 0 or 1 element.
ObjC can have multiple llvm.used but that should work. So if there is
risk, the risk for other frontends.
I don't see a way to avoid that, but they can switch to llvm.compiler.used.

Non-ELF users should not observe anything different.

(c) Change llvm.used to use SHF_GNU_RETAIN if integrated assembler or
>=2.36

I am curious how you intend to check if binutils>=2.36. This is not
something you can decide when the compiler is built.
--paulr

> (c) Change llvm.used to use SHF_GNU_RETAIN if integrated assembler or
> >=2.36

I am curious how you intend to check if binutils>=2.36. This is not
something you can decide when the compiler is built.
--paulr

-fintegrated-as can use SHF_GNU_RETAIN. GNU ld not recognizing
SHF_GNU_RETAIN just ignores the flag (ELF spirit: ignore what you
don't understand).
-fno-integrated-as users (rare) can use -fbinutils-version=2.36
(⚙ D85474 Add -fbinutils-version= to gate ELF features on the specified binutils version )

> > (c) Change llvm.used to use SHF_GNU_RETAIN if integrated assembler or
> > >=2.36
>
> I am curious how you intend to check if binutils>=2.36. This is not
> something you can decide when the compiler is built.
> --paulr

-fintegrated-as can use SHF_GNU_RETAIN. GNU ld not recognizing
SHF_GNU_RETAIN just ignores the flag (ELF spirit: ignore what you
don't understand).
-fno-integrated-as users (rare) can use -fbinutils-version=2.36

Thanks! Was not aware of the -fbinutils-version option.
--paulr

(best to include folks from previous conversations in threads - sometimes
we can't all keep up to date with all the threads happening - so I've added
John McCall here, and echristo since he might have some thoughts on this
too)

I'd lean towards (1) too myself - give the LLVM constructs consistent
semantics, and deal with the platform differences in the frontend during
the mapping down to LLVM.

I agree that we should go with (1) and give the LLVM construct
consistent semantics to the best of our ability.

John.

>
> (best to include folks from previous conversations in threads - sometimes we can't all keep up to date with all the threads happening - so I've added John McCall here, and echristo since he might have some thoughts on this too)
>
> I'd lean towards (1) too myself - give the LLVM constructs consistent semantics, and deal with the platform differences in the frontend during the mapping down to LLVM.

I chatted with Saleem Abdulrasool, who is in favor of (1), too.

I am going to send these patches:

Implemented the idea. Sent:

(a) Add CodeGenModule::addUsedOrCompilerUsedGlobal (which uses
llvm.compiler.used for ELF and llvm.used for the others). Migrate some
addUsedGlobal call sites to use addUsedOrCompilerUsedGlobal.

https://reviews.llvm.org/D97446

(b) Add __attribute__((retain))

https://reviews.llvm.org/D97447

(c) Change llvm.used to use SHF_GNU_RETAIN if integrated assembler or
>=2.36

https://reviews.llvm.org/D97448