Is there some sort of "@llvm.thread_ctors."?

We know that in C++, the constructor of a static member will get called when the program starts up. I checked the generated IR code and found this is implemented by defining a __cxx_global_var_init() function and marked it as section “.text.startup” and assign it to @llvm.global_ctors.

We also know that in C++, the constructor of a static thread_local member will not get called when the program starts, but the first time this member is used, I also checked the IR code, this is implemented by calling it’s constructor at the usage, and then call __cxa_thread_atexit to register it’s destructor.

Now I want to add thread_local feature to my own language, and I want my thread_local member acts simlar like a static member, that is, when a new thread is created, the constructors of thread_local members will automatically get called, instead get called when used?

I read the LLVM Language Reference Manual, but it doesn’t talk much in Thread Local Storage Models.

We know that in C++, the constructor of a static member will get called when the program starts up. I checked the generated IR code and found this is implemented by defining a __cxx_global_var_init() function and marked it as section “.text.startup” and assign it to @llvm.global_ctors.

The last is the important part. But that’s just an abstraction across the various global constructor mechanisms provided by the underlying platforms.

We also know that in C++, the constructor of a static thread_local member will not get called when the program starts, but the first time this member is used, I also checked the IR code, this is implemented by calling it’s constructor at the usage, and then call __cxa_thread_atexit to register it’s destructor.

This behavior is not guaranteed by C++. It would also be acceptable according to the standard to construct the thread-locals eagerly upon thread startup, as you want to do. However, yes, clang does always lazy initialize them.

Now I want to add thread_local feature to my own language, and I want my thread_local member acts simlar like a static member, that is, when a new thread is created, the constructors of thread_local members will automatically get called, instead get called when used?

I read the LLVM Language Reference Manual, but it doesn’t talk much in Thread Local Storage Models.

No platform I know of provides such a mechanism that you (or llvm) could hook into. If you want this, you’ll need to create your own thread-startup routine that calls your own thread-local initializers.

The most obvious way to do this when there’s no shared-libraries in the mix would be to emit the initializer-function pointers into a custom named section, and then iterate over that section in your thread startup routine. The linker automatically creates __start_SECTIONNAME and __stop_SECTIONNAME symbols, so you can just iterate on the pointers between them. (At least this works in ELF, not sure if it’s the same on COFF/MachO).

If you need to also support shared libraries, then finding all constructors across all libraries gets somewhat more complex…

The __start_SECTIONNAME trick is ELF-specifc. Note that the section name has to be a valid C identifier for this to work: it can only contain letters, numbers (except at the start), and underscores. In particular, the section name can’t have a period (.) in it. LLD ELF’s logic for this is at https://github.com/llvm/llvm-project/blob/a506f7f9105eec4baac296d21c922457d6f4b52a/lld/ELF/Writer.cpp#L1973

In COFF, this is usually implemented using grouped sections: if you have sections in your input files named .foo$A, .foo$B, etc., the linker places them in the output section .foo but orders them according to the input section name (e.g. all .foo$B contents will end up after all .foo$A contents). You can therefore have your start of list symbol in .foo$A, your end of list symbol in .foo$Z, and your list contents in .foo$B (or any other letter that’s not A or Z), and iterate over the list that way. More details on grouped sections can be found at https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#grouped-sections-object-only

I’m not familiar with how to do this on Mach-O, but I believe there are dynamic linker APIs for traversing the contents of a particular section that can be used for this purpose.

I should mention that Windows already has a section with callbacks to run on thread startup. I believe it is .CRT$XL[A-Z], and you can see it in use here:
https://reviews.llvm.org/D71786