[libLTO] accessing llvm.global_ctors

Hi,
Our linker uses the libLTO interface. Given an lto_module_t, we are trying to check if it contains the @llvm.global_ctors or the @llvm.global_dtors global variables. We need to know this information in order to decide whether to include a bitcode archive member in the LTO step or not.

Typically, symbols in a module are visited using the lto_module_get_[num_symbols|symbol_name|symbol_attribute] API, but it seems that symbols starting with “llvm.” are skipped.

Is there a way to check for the existence of these ctor/dtor variables in an LTO module (through the libLTO API)?

If not, then I can think of two ways to solve it:

  1. add a new function to libLTO that returns true if either of the two variables are present. Alternatively it can return a list of variable names (essentially the contents of @llvm.global_ctors or @llvm.global_dtors)
  2. add a new attribute to lto_symbol_attributes denoting that the given symbol is a constructor or a destructor (i.e. is present in either @llvm.global_ctors or @llvm.global_dtors)

Thank you for your feedback

Wael Yehia
wyehia@ca.ibm.com

Hi,
Our linker uses the libLTO interface. Given an lto_module_t, we are trying to
check if it contains the @llvm.global_ctors or the @llvm.global_dtors global
variables. We need to know this information in order to decide whether to
include a bitcode archive member in the LTO step or not.

This is strange. Constructors should be orthogonal to archive member extraction.
If an archive member is unused, it is not extracted and its constructors
are suppressed.

C++ [basic.start.dynamic] makes the archive member extraction behavior
more plausible:
"It is implementation-defined whether the dynamic initialization of a
non-block non-inline variable with static storage duration is sequenced
before the first statement of main or is deferred."

Typically, symbols in a module are visited using the `lto_module_get_
[num_symbols|symbol_name|symbol_attribute]` API, but it seems that symbols
starting with "llvm." are skipped.

llvm.* symbols have the SF_FormatSpecific flag.
Such symbols are in the module symbol table but skipped by readers.

Is there a way to check for the existence of these ctor/dtor variables in an
LTO module (through the libLTO API)?

You may need LLVMGetNamedGlobal if you are using the C API.

Certain linker flag forces it to include all archive members that have a constructor or destructor. I agree it’s strange.

Hi,
Our linker uses the libLTO interface. Given an lto_module_t, we are trying to
check if it contains the @llvm.global_ctors or the @llvm.global_dtors global
variables. We need to know this information in order to decide whether to
include a bitcode archive member in the LTO step or not.

This is strange. Constructors should be orthogonal to archive member extraction.
If an archive member is unused, it is not extracted and its constructors
are suppressed.

Certain linker flag forces it to include all archive members that have a constructor or destructor. I agree it's strange.

I am fine with adding a new C API to read the constructor and destructor if needed. I don't think it makes sense to encode llvm.global_ctors and llvm.global_dtors into the symbol table since they are not symbols. It also sounds like that you just need to know if the object file has constructor or not, but don't really care anything more than that? Can you read the section name from the IRSymtab to figure out if that is a constructor/destructor or not?

Hi,
Our linker uses the libLTO interface. Given an lto_module_t, we are trying to
check if it contains the @llvm.global_ctors or the @llvm.global_dtors global
variables. We need to know this information in order to decide whether to
include a bitcode archive member in the LTO step or not.

This is strange. Constructors should be orthogonal to archive member extraction.
If an archive member is unused, it is not extracted and its constructors
are suppressed.

Certain linker flag forces it to include all archive members that have a constructor or destructor. I agree it's strange.

Is it possible to disclose what linker and what platform that is? You don't have to but it would be easier to maintain if you can document this behavior that is unique to the platform you are looking at.

Thanks

Steven

-----"Steven Wu" <stevenwu@apple.com> wrote: -----

-----"Steven Wu" <stevenwu@apple.com> wrote: -----
To: "Wael Yehia" <wyehia@ca.ibm.com>
From: "Steven Wu" <stevenwu@apple.com>
Date: 07/26/2021 12:22PM
Cc: "Fangrui Song" <maskray@google.com>, llvm-dev@lists.llvm.org
Subject: [EXTERNAL] Re: [llvm-dev] [libLTO] accessing llvm.global_ctors

It also sounds like that you just need to know if the object file has constructor or not, but don't really care anything more than that?

yes

Can you read the section name from the IRSymtab to figure out if that is a constructor/destructor or not?

Is this a suggestion of how to implement the query I'm after?
My initial thought was to simply check for the presence of the symbols `llvm.global_ctors` and `llvm.global_dtors` in the ModuleSymbolTable in the LTOModule.

The reason why I don't want to do that is because they are not real symbols and linker do not know how to resolve them. You could stamp a different attribute on them but the old linker will not be able to understand them.

If the only thing you care about is checking the existence of ctors and dtors, and all you need is using fullLTO with libLTO, reading intrinsics is probably fine. ThinLTO or using the new C++ LTO interface doesn't really use ModuleSymbolTable. Also trying to read what the constructors are will be quite expensive from those functions because they needs to be materialized so you don't want to do that when you just querying the symbol names

Certain linker flag forces it to include all archive members that have a constructor or destructor. I agree it's strange.

Is it possible to disclose what linker and what platform that is? You don't have to but it would be easier to maintain if you can document this behavior that is unique to the platform you are looking at.

The linker is the system linker on AIX, and the option is -bcdtors:all (documented here: IBM Documentation)

Looks reasonable. Just remember to mention that in your patch.

Steven

-----"Steven Wu" <stevenwu@apple.com> wrote: -----

Do we need a new lto-c API? Can something like LLVMGetNamedGlobal be used?

That sounds fine to me.

Steven

Do we need a new lto-c API? Can something like LLVMGetNamedGlobal be used?

The libLTO.so library only exports functions from `llvm-c/lto.h`, so LLVMGetNamedGlobal won't be accessible.

Do we need a new lto-c API? Can something like LLVMGetNamedGlobal be used?

The libLTO.so library only exports functions from `llvm-c/lto.h`, so LLVMGetNamedGlobal won't be accessible.