Determining TLS memory ranges

Hi all,

LDC is an LLVM-based compiler for the D programming language, which
supports garbage collection and in which globals/statics are
thread-local by default. Thus, we need a way to determine the memory
locations used for TLS data, in order to be able to register them as
GC root ranges in program startup and thread initialization code.

We are currently using LLVM thread_local globals to represent TLS
variables, so ideally LLVM would provide intrinsics which are lowered
to the library calls, … appropriate for the target's TLS
implementation. However, as far as I know, such functionality does not
exist (and it would probably not be used by most LLVM clients), so I'm
afraid we have no other choice than getting our hands dirty.

LDC currently uses vanilla LLVM (3.0-3.2) and it would be great if we
could keep it that way. This unfortunately makes it hard/impossible to
implement the same strategy as the other D compilers use, which is to
emit special bracketing symbols before and after the TLS data, and
simply using the range between them for GC. As far as I know, there is
no way to control the order in which variables are emitted to the
segments on the IR level (we are using MC for emitting object files
and the system linker for linking, please correct me if I'm wrong).

Thus, it looks like we have to get our hands dirty and implement the
necessary ABI/OS-specific functions manually in the language runtime
code. Currently, the best idea I have is to:

- On Linux, find out the size of the TLS segments by using
dl_iterate_phdr on initialization, then add the block of this size
starting at __tls_get_addr({module, 0}) on thread initialization.

- On OS X, use something similar to the private
functions in dyld.

Is there a better way to do this? Ideally, the solution would be
portable across different OSes, but right now I'm happy if I get
things to work on GNU/Linux and OS X x86/x86_64. Also, it would be
best if the solution could be easily extended to work with dynamically
loaded libraries, but as the D runtime contains a few other related
quirks anyway, this is not a priority right now.

Thanks for any suggestions,