What is an address-significant symbol?

Hello LLVM- and Clang-List,

I’m not sure if this a subject for LLVM or Clang – but there is something I don’t understand. I wrote the following code in C++:

searchPlanschi(&__stdio_common_vswprintf);

‚searchPlanschi‘ is a function I provide. Clang generates the following assembly code for this:

lea rcx, [rip + __stdio_common_vswprintf]

call “?searchPlanschi@@YAXPEAX@Z”

I was surprised to see the register rip there, as far as I know this is the Instruction register, right? Why do I need the rip register to get the address of the function? I searched the assembly file for ‘__stdio_common_vswprintf’ to get some hints about this. The only thing I found was:

.addrsig

.addrsig_sym __stdio_common_vswprintf

So I googled “.addrsig” and found the following text:

“This section is used to mark symbols as address-significant, i.e. the address of the symbol is used in a comparison or leaks outside the translation unit. It has the same meaning as the absence of the LLVM attributes unnamed_addr and local_unnamed_addr.
Any sections referred to by symbols that are not marked as address-significant in any object file may be safely merged by a linker without breaking the address uniqueness guarantee provided by the C and C++ language standards.
The contents of the section are a sequence of ULEB128-encoded integers referring to the symbol table indexes of the address-significant symbols.”

But sadly… this is way over my head. What does that actually mean? Does that explain the code construct with the rip register? Is that a form of optimization?

Thank you in advance for any help!

Kind greetings

Björn

Hello LLVM- and Clang-List,

I’m not sure if this a subject for LLVM or Clang – but there is something I don’t understand. I wrote the following code in C++:

searchPlanschi(&__stdio_common_vswprintf);

‚searchPlanschi‘ is a function I provide. Clang generates the following assembly code for this:

     lea rcx, [rip + __stdio_common_vswprintf]

      call "?searchPlanschi@@YAXPEAX@Z"

I was surprised to see the register rip there, as far as I know this is the Instruction register, right? Why do I need the rip register to get the address of the function? I searched the assembly file for ‘__stdio_common_vswprintf’ to get some hints about this. The only thing I found was:

      .addrsig

      .addrsig_sym __stdio_common_vswprintf

So I googled “.addrsig” and found the following text:

“This section is used to mark symbols as address-significant, i.e. the address of the symbol is used in a comparison or leaks outside the translation unit. It has the same meaning as the absence of the LLVM attributes unnamed_addr and local_unnamed_addr.
Any sections referred to by symbols that are not marked as address-significant in any object file may be safely merged by a linker without breaking the address uniqueness guarantee provided by the C and C++ language standards.
The contents of the section are a sequence of ULEB128-encoded integers referring to the symbol table indexes of the address-significant symbols.”

But sadly… this is way over my head. What does that actually mean? Does that explain the code construct with the rip register? Is that a form of optimization?

There is a linker optimization called identical code folding (ICF).
The details of the initial implementation in gold is described in
Safe ICF: Pointer Safe and Unwinding Aware Identical Code Folding in Gold – Google Research . ICF can cause problems when
the program depends on functions having a unique address, both gold
and LLD have an --icf=safe mode that limits the scope of the
optimization to avoid folding sections that are "address-significant".
The implementation in gold uses linker heuristics such as relocation
type to determine address significance. The implementation in LLD uses
information generated by the compiler, which is placed in .addrsig.

I can't answer the question about code-generation off the top of my
head, my understanding is that .addrsig is primarily used for
implementing --icf=safe in linkers, I don't think it has an affect on
code-generation.

Hope this is of some help

Peter