Adding support for explicitly specified TLS models (PR9788)

Hi all,

I would like to hear your thoughts on adding support for extending the
IR to allow for explicitly selecting which model to use for
thread-local storage of a variable.

The motivation is to allow Clang to support the "tls_model" variable
attribute [1], as requested in PR9788.

The idea would be to extend the IR to allow an optional TLS-model
argument to the thread_local attribute, like this:

  @x = thread_local(initialexec) global i32 42

Just as it is illegal to specify thread_local for a target that
doesn't support it, specifying a TLS model that isn't supported by the
target would be illegal. If a model is not specified, the target gets
to choose the appropriate model, as it does today.

Clang would need to know which models are supported by which targets
and give an error (or warn and fall back to the default?) if the user
tries to use an unsupported model, just as it currently gives an error
when trying to use thread-local variables on unsupported targets.

The models in question are described in Section 4 of [2]. As far as I
understand, LLVM supports the following models:

- X86/ELF: general dynamic, initial exec (but not for PIC code?), local exec
- ARM/ELF: general dynamic, local exec
- Mips: general dynamic, local dynamic, initial exec, local exec
- X86 for Darwin and Windows, and XCore, do their own thing
- The other targets don't support thread-local storage

If this sounds good, I've got patches coming up.

Thanks,
Hans

[1] http://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/Variable-Attributes.html#index-g_t_0040code_007btls_005fmodel_007d-attribute-1797
[2] http://www.akkadia.org/drepper/tls.pdf

For ELF, it doesn't make sense to reject one as it can always be
relaxed. An application program shouldn't have to worry about that, it
is just an implementation detail.

Joerg

I'd be fine with that.

(Clang should still worry about it though, so it can provide a "the
TLS model you selected is not supported by the target, it will fall
back to another one" warning.)

- Hans

There is no point in warning about it. Basically, anything but global
dynamic a pure performance optimisation.

Joerg

It was pointed out in #llvm that programs that mix C and assembly to
access thread-local variables would break if the TLS model isn't what
they specify, so I still think there's a case for warning about it.

- Hans

Do they? In which specific constellation? The attribute is used only by
codegen and the relocations for the different access methods can
co-exist.

My main concern is code that needs to run across a wide number of platforms.
Requesting a specific hot variable to be initial-exec can have a huge impact
on runtime on platforms that supports this model like x86. It would be
painful to have to use #if wrappers to workaround some bogus warning on
platforms where initial-exec and global-dynamic are the same.

Joerg

It was pointed out in #llvm that programs that mix C and assembly to
access thread-local variables would break if the TLS model isn't what
they specify, so I still think there's a case for warning about it.

Do they? In which specific constellation? The attribute is used only by
codegen and the relocations for the different access methods can
co-exist.

I don't know exactly what he meant. I figured that accessing a
thread-local variable via different access models (one from c code and
one from hand-written asm) would be a problem, but maybe it isn't in
practice.

My main concern is code that needs to run across a wide number of platforms.
Requesting a specific hot variable to be initial-exec can have a huge impact
on runtime on platforms that supports this model like x86. It would be
painful to have to use #if wrappers to workaround some bogus warning on
platforms where initial-exec and global-dynamic are the same.

I see your point. FWIW, gcc doesn't warn about this either, as far as
I can tell.

- Hans

Reviving this thread with a patch!

And some comments inline.

Please take a look and let me know what you think.

- Hans

Hi all,

I would like to hear your thoughts on adding support for extending the
IR to allow for explicitly selecting which model to use for
thread-local storage of a variable.

The motivation is to allow Clang to support the "tls_model" variable
attribute [1], as requested in PR9788.

This enables better performance for some code. For example, TCMalloc
[3] would like to use the initial-exec model, because it's faster.

The idea would be to extend the IR to allow an optional TLS-model
argument to the thread_local attribute, like this:

@x = thread_local(initialexec) global i32 42

Just as it is illegal to specify thread_local for a target that
doesn't support it, specifying a TLS model that isn't supported by the
target would be illegal. If a model is not specified, the target gets
to choose the appropriate model, as it does today.

As discussed on the original thread, I've changed my mind here. I
don't think there should be any error (or warning) for specifying a
model that isn't supported by a particular target. The target should
just fall back to whatever model it supports.

The models in question are described in Section 4 of [2]. As far as I
understand, LLVM supports the following models:

- X86/ELF: general dynamic, initial exec (but not for PIC code?), local exec

It now supports all four models.

- ARM/ELF: general dynamic, local exec

And also initial exec.

- Mips: general dynamic, local dynamic, initial exec, local exec
- X86 for Darwin and Windows, and XCore, do their own thing
- The other targets don't support thread-local storage

[1] http://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/Variable-Attributes.html#index-g_t_0040code_007btls_005fmodel_007d-attribute-1797
[2] http://www.akkadia.org/drepper/tls.pdf

[3] http://code.google.com/searchframe#BGeH2W13jNw/trunk/src/thread_cache.h&l=257

tls_models.diff (35.8 KB)

Reviving this thread with a patch!

And some comments inline.

Please take a look and let me know what you think.

Just a high level comment, why do you need the 4 modes + default?
Can't clang just produce globaldynamic (or the attribute value) and
let llvm optimize it? Since in the end the linker is allowed to relax
tls access too, llvm should be allowed to make the mode more specific
even if the user added an attribute at the C/C++ level.

- Hans

Cheers,
Rafael

I thought it was a good idea to make the user's choice explicit in the
IR. If we combined the default and globaldynamic modes, LLVM wouldn't
be able to tell the difference.

It may or may not be important to be able to tell the difference, but
it would be unfortunate if we'd have to go and change the IR format
later because we limited ourselves here.

Also, my patch does make a difference between the default and
globaldynamic. If user specifies globaldynamic, LLVM will use that
model, even if some other model would be better (it even adds support
for doing globadynamic in non-PIC code). GCC does the same.

Thanks,
Hans

Ping?

tls_models.diff (35.8 KB)

I thought it was a good idea to make the user's choice explicit in the
IR. If we combined the default and globaldynamic modes, LLVM wouldn't
be able to tell the difference.

It may or may not be important to be able to tell the difference, but
it would be unfortunate if we'd have to go and change the IR format
later because we limited ourselves here.

Also, my patch does make a difference between the default and
globaldynamic. If user specifies globaldynamic, LLVM will use that
model, even if some other model would be better (it even adds support
for doing globadynamic in non-PIC code). GCC does the same.

Do you know what is the rationale for that? The static linker will
optimize it anyway (but not do as good a job as the compiler could).

If unsure I think it is safer to go without the 'default'. It is
easier to add it to IL later than to remove it if it turns out we
don't need it.

Thanks,
Hans

Cheers,
Rafael

P.S.: Sorry for being so slow on the code reviews, was really busy.

codegen can be more efficient. E.g. less or no calls to __tls_get_addr
needed.

Joerg

Do you know what is the rationale for that? The static linker will
optimize it anyway (but not do as good a job as the compiler could).

codegen can be more efficient. E.g. less or no calls to __tls_get_addr
needed.

My point also :slight_smile: What I was asking for a rationale on was *not* doing
the optimization in the compiler.

Joerg

Cheers,
Rafael

I managed to dig out the original thread for GCC:
http://gcc.gnu.org/ml/gcc-patches/2002-09/msg00668.html

It doesn't give a rationale for the case we're discussing, though :confused:

My intuition still tells me that it would be good to separate the
default and globaldynamic cases to

1) Respect the user's request: if the user went through the trouble of
specifying __attribute__((tls_model("globaldynamic"))), we should
assume there's a reason and give him what he wants, even if we think
the linker is going to optimize it
2) To match GCC's behavior.

I don't have any more compelling reasons than those, and I don't feel
super strongly about this, so I'm willing to give in if others
disagree with me :slight_smile:

Thanks,
Hans

I managed to dig out the original thread for GCC:
http://gcc.gnu.org/ml/gcc-patches/2002-09/msg00668.html

It doesn't give a rationale for the case we're discussing, though :confused:

My intuition still tells me that it would be good to separate the
default and globaldynamic cases to

1) Respect the user's request: if the user went through the trouble of
specifying __attribute__((tls_model("globaldynamic"))), we should
assume there's a reason and give him what he wants, even if we think
the linker is going to optimize it
2) To match GCC's behavior.

I don't have any more compelling reasons than those, and I don't feel
super strongly about this, so I'm willing to give in if others
disagree with me :slight_smile:

I kind of agree with the intuition, but If you don't mind I would
probably ask for something stronger before changing the IL. If we do
find a case where for some reason the compiler cannot optimize the
model, adding a 'default' to the IL should be easy.

Duncan, you know the gcc internals, any thoughts?

Thanks,
Hans

Cheers,
Rafael

Ah, sorry. Because the compiler might not know it yet. Consider linking
code compiled with -fPIC into the main binary, either using PIE or not.

Joerg

Right, but the issue we're discussing here, is why GCC chooses to not
optimize a variable like this:

  static __thread int __attribute((tls_model("global-dynamic"))) x;

even if it could (e.g. when compiled without -fpic).

We're trying to figure out if there's a good reason for this, and if
we want to mimic that behaviour in LLVM, or if we should just take the
tls_model attribute as a starting point, and choose a better model if
we can.

- Hans

On i386, non-PIC code can be used in shared libraries. If the TLS
variable itself is large, forcing global-dynamic makes sense. No idea if
anyone is insane enough to do that, but that's the case where honouring
the user request would make a difference.

Joerg

Hi Rafael,

I kind of agree with the intuition, but If you don't mind I would
probably ask for something stronger before changing the IL. If we do
find a case where for some reason the compiler cannot optimize the
model, adding a 'default' to the IL should be easy.

Duncan, you know the gcc internals, any thoughts?

I don't know anything about TLS stuff. Maybe best to ask on the gcc mailing
list why they do things this way?

Ciao, Duncan.