Adding support for explicitly specified TLS models (PR9788)

I sent an email to gcc-help:
http://gcc.gnu.org/ml/gcc-help/2012-06/msg00116.html

Let's see if that brings any more info.

Thanks,
Hans

Ah, sorry. Because the compiler might not know it yet. Consider linking
code compiled with -fPIC into the main binary, either using PIE or not.

Yes, the case that is interesting is where both can. Why should the
compiler wait for the linker?

Joerg

Cheers,
Rafael

Right, that's a good point.

On i386 Linux, the following code,

  static __thread int x[100000];
  int* get_x() { return x; }

compiled with "clang -shared a.c -o a.so" won't be possible do dlopen.

The tls_model attribute comes to the rescue:
__attribute((tls_model("global-dynamic"))) will fix this, or
specifying -ftls-model="global-dynamic" on the command-line. But this
only works if we actually give the user the requested model.

Rafael, I think your argument has been that if we're unsure if there
is any reason to strictly preserve the user's choice, we should just
let LLVM choose a better model (and not have to specify any difference
between the "default" and global-dynamic models in the IR).

I'd like to argue the other way around. We see that gcc intentionally
preserves the user's choice ("you've asked for it, therefore you get
what you've asked for"), and Joerg provided a case (although probably
not common) where it matters. I think that since we are unsure of
whether any code in the wild relies on this, we should choose the
safer and simpler semantics of using the specified model, which means
that we have to tell the difference between the "default" (LLVM gets
to choose the model) and the four specific models.

Thanks,
Hans

Right, that's a good point.

On i386 Linux, the following code,

static __thread int x[100000];
int* get_x() { return x; }

compiled with "clang -shared a.c -o a.so" won't be possible do dlopen.

I get exactly the same result with gcc 4.7 with and without
-ftls-model=global-dynamic:

movq %gs:0, %eax
addl $x@ntpoff, %eax

Rafael, I think your argument has been that if we're unsure if there
is any reason to strictly preserve the user's choice, we should just
let LLVM choose a better model (and not have to specify any difference
between the "default" and global-dynamic models in the IR).

Not quiet. My main argument is that we are not adding value, since the
linker also does optimizations and the ELF format doesn't list if the
choice of model was explicit or not.

Secondary arguments for going without a "default" value are:

* It is much easier to add it if we find that we do need it than it is
to remove it if we convince ourselves that it is redundant.
* It adds complexity. For example, LLVM would have to distinguish
being run in "compiler mode" versus "linker mode", as the optimization
is clearly valid at LTO time even if we find a reason for not doing it
during compilation.

Thanks,
Hans

Cheers,
Rafael

Right, that's a good point.

On i386 Linux, the following code,

static __thread int x[100000];
int* get_x() { return x; }

compiled with "clang -shared a.c -o a.so" won't be possible do dlopen.

I get exactly the same result with gcc 4.7 with and without
-ftls-model=global-dynamic:

movq %gs:0, %eax
addl $x@ntpoff, %eax

Thanks for bearing with me on this. I think it's important that we get it right.

I could have sworn I tried with -ftls-model yesterday, but apparently
I didn't. Seems it works differently than I assumed (I guess I should
have learnt not too assume things by now). Sorry about that.

The point still stands, though: that code requires the tls_model
attribute to be respected; if the compiler chooses local-exec instead,
it won't be dlopen-able.

The -ftls-model command-line flag has the behavior you describe: the
user sets what the default model should be (global-dynamic if nothing
else is specified), but if the compiler can choose a better model,
that will be used instead. I want to implement this flag too, but that
will be the subject of another patch.

I still think the behaviour in my patch is right. This is an
implementation of a GCC attribute, so unless we have a very good
reason not to, we should make it behave in the same way. And I think
that behaviour makes sense: the attribute is a way for the programmer
to ask for something very specific, so that's what he should get; the
attribute is more like a requirement than a suggestion. AFAIK, that's
how other attributes work too. If I specified the always_inline
attribute on a function, I'd expect the compiler to do its best to
inline it, even if it might not be profitable.

Rafael, I think your argument has been that if we're unsure if there
is any reason to strictly preserve the user's choice, we should just
let LLVM choose a better model (and not have to specify any difference
between the "default" and global-dynamic models in the IR).

Not quiet. My main argument is that we are not adding value, since the
linker also does optimizations and the ELF format doesn't list if the
choice of model was explicit or not.

I think it adds value to make the behaviour the same as GCC's. And
it's not like the difference wouldn't be detectable: the asm is
different, the .o files are different, and the code from the non-PIC
i386 shared lib example wouldn't load with dlopen.

The question is what value it adds to make it work differently from GCC.

Secondary arguments for going without a "default" value are:

* It is much easier to add it if we find that we do need it than it is
to remove it if we convince ourselves that it is redundant.

I don't think it is redundant.

(And removing the "default" value while preserving backwards
compatibility just means we've wasted one value in the bitcode
encoding.)

* It adds complexity. For example, LLVM would have to distinguish
being run in "compiler mode" versus "linker mode", as the optimization
is clearly valid at LTO time even if we find a reason for not doing it
during compilation.

I don't know enough about how the LTO code works to comment on this one.

Thanks,
Hans

The point still stands, though: that code requires the tls_model
attribute to be respected; if the compiler chooses local-exec instead,
it won't be dlopen-able.

This is a fairly contrived use case, and an extension of a gcc extension.

I think it adds value to make the behaviour the same as GCC's. And
it's not like the difference wouldn't be detectable: the asm is
different, the .o files are different, and the code from the non-PIC
i386 shared lib example wouldn't load with dlopen.

The gcc behavior is inconsistent with the linker and, as you found
out, with itself.

The question is what value it adds to make it work differently from GCC.

Secondary arguments for going without a "default" value are:

* It is much easier to add it if we find that we do need it than it is
to remove it if we convince ourselves that it is redundant.

I don't think it is redundant.

(And removing the "default" value while preserving backwards
compatibility just means we've wasted one value in the bitcode
encoding.)

No, to do the upgrade without changing the final result, the value
that "default" should map too depends a lot on the context. It depends
if we are in a "compiler context" or a "linker context". It depends on
-fPIC or not, and that information is not normally available during an
upgrade.

I don't know enough about how the LTO code works to comment on this one.

The summary is that in LTO we are part of the linker, so if the linker
can do an optimization, so can we.

Before you wrote

I don't have any more compelling reasons than those, and I don't feel
super strongly about this, so I'm willing to give in if others
disagree with me :slight_smile:

What about this:

* We implement this first without a 'default' value. This covers the
one real world use case I know of (libc knowing it is not dlopened).
* I am to blame if we missed some other use case and commit to
implementing it in the future if/when it is found :slight_smile:

Thanks,
Hans

Cheers,
Rafael

(To the list this time..)

The point still stands, though: that code requires the tls_model
attribute to be respected; if the compiler chooses local-exec instead,
it won't be dlopen-able.

This is a fairly contrived use case, and an extension of a gcc extension.

I think it adds value to make the behaviour the same as GCC's. And
it's not like the difference wouldn't be detectable: the asm is
different, the .o files are different, and the code from the non-PIC
i386 shared lib example wouldn't load with dlopen.

The gcc behavior is inconsistent with the linker and, as you found
out, with itself.

Yeah, but I've gotten more attached to the GCC behaviour the more I've
thought about it :confused:

I respect that you have more experience with both LLVM and the TLS
stuff than me, though :slight_smile:

Before you wrote

I don't have any more compelling reasons than those, and I don't feel
super strongly about this, so I'm willing to give in if others
disagree with me :slight_smile:

What about this:

* We implement this first without a 'default' value. This covers the
one real world use case I know of (libc knowing it is not dlopened).
* I am to blame if we missed some other use case and commit to
implementing it in the future if/when it is found :slight_smile:

OK, let's give it a couple more days and see if someone else chimes in
or something new comes up. In the meantime, I'll rework the patch the
way you've suggested and if nothing's changed we can land it by the
end of next week. Sounds like a plan?

Thanks,
Hans

Attaching a new patch that has the behaviour we discussed.

The "globaldynamic" and default values have been merged, and LLVM will
start off with the user-specified model, but choose a more specific
one if possible.

Please review.

Thanks,
Hans

tls_models2.diff (39.7 KB)

Attaching a new patch that has the behaviour we discussed.

The "globaldynamic" and default values have been merged, and LLVM will
start off with the user-specified model, but choose a more specific
one if possible.

Please review.

Awesome, thanks!
I will try to do a more complete review tonight or tomorrow. For now,
just two quick observations

*) This breaks the clang build, it would be nice to for the first
commit to keep the old API. It can be removed as soon as clang is
updated.
*) Please name the most general model GeneralDynamicTLSMode. Elf's
default visibility being called 'default' is already confusing enough
:slight_smile:

I was not sure how hard the first item would be, so I just gave it a
try. The resulting patch is attached.

Thanks,
Hans

Cheers,
Rafael

t.patch (37.7 KB)

Attaching a new patch that has the behaviour we discussed.

The "globaldynamic" and default values have been merged, and LLVM will
start off with the user-specified model, but choose a more specific
one if possible.

Please review.

Awesome, thanks!
I will try to do a more complete review tonight or tomorrow.

Great, looking forward to landing this :slight_smile:

For now,
just two quick observations

*) This breaks the clang build, it would be nice to for the first
commit to keep the old API. It can be removed as soon as clang is
updated.

I have the clang patch ready (it was reviewed on cfe-commits), so my
plan was just to commit them both at once.

If we don't want to do that, I think we should try to add the new
constructors while keeping the old ones around, and then delete the
old constructors once clang is updated.

*) Please name the most general model GeneralDynamicTLSMode. Elf's
default visibility being called 'default' is already confusing enough
:slight_smile:

I was thinking that calling it GeneralDynamicTLSMode wouldn't make
sense for non-ELF targets. My thinking was that DefaultTLSModel would
mean "use the default model for the target, which for ELF is general
dynamic, and on Darwin is whatever Darwin does, etc.".

I was not sure how hard the first item would be, so I just gave it a
try. The resulting patch is attached.

Your patch preserves the current constructors. Is the idea that we
wouldn't change the constructors, and clang would call
setThreadLocalMode() when it creates global variables? I would feel
better if we changed the constructors, to avoid the risk of forgetting
to do stuff like
NewGV->setThreadLocalMode(OldGV->getThreadLocalMode()) when new
variables are created based on old ones.

Thanks,
Hans

If we don't want to do that, I think we should try to add the new
constructors while keeping the old ones around, and then delete the
old constructors once clang is updated.

Yes, this would probably be the best. To make clang build I just
hacked the constructors in the patch I posted, but they should really
just forward to the new ones.

*) Please name the most general model GeneralDynamicTLSMode. Elf's
default visibility being called 'default' is already confusing enough
:slight_smile:

I was thinking that calling it GeneralDynamicTLSMode wouldn't make
sense for non-ELF targets. My thinking was that DefaultTLSModel would
mean "use the default model for the target, which for ELF is general
dynamic, and on Darwin is whatever Darwin does, etc.".

This is a good point, but it applies to the other names too, right?
For Darwin, all 4 models map to the one that Darwin has. The two
closest examples to this in llvm are linkage and visibility. For
linkage we have our own names and a superset of any object format. For
visibility we use the ELF visibility names and other targets map it to
what they have (macho has only default and hidden I think).

It should at least be self consistent I think. I am ok with 4 llvm
specific names or using the 4 names used by the tlf.pdf document. Just
make sure none of them is named 'default' :slight_smile:

Your patch preserves the current constructors. Is the idea that we
wouldn't change the constructors, and clang would call
setThreadLocalMode() when it creates global variables? I would feel
better if we changed the constructors, to avoid the risk of forgetting
to do stuff like

No, this was just a hack to get clang to build. I would suggest
changing the constructor just like your patch does, but keeping the
existing ones just forwarding to the new ones.

NewGV->setThreadLocalMode(OldGV->getThreadLocalMode()) when new
variables are created based on old ones.

The rest of the review:

+ separated copy of the variable). Optionally, a suggested TLS model may be

Not sure I would call it "suggested". What it is is a promise by the
FE/user that some restrictions apply to how the variable is used:

* localdynamic: Only used from this DSO.
* initialexec: Will not be loaded dynamically.
* localexec: Will be in the executable and is only used from it.

The restrictions should be documented too.

+ bool isThreadLocal() const { return threadLocalMode; }

Add a != NotThreadLocal to make it explicit.

+ default: // Map unknown non-zero value to default.

Why?

+/// Get the IR-specified TLS model for GV, or GeneralDynamic if no model
+/// was selected.

This comment is out of date, no?

+ return TLSModel::GeneralDynamic;

And this return is dead, you can use llvm_unreachable.

Thanks for the tests. Can you please add

* One running llvm-dis
* One targeting Darwin? Just extending X86/tls-models.ll would be fine.

Thanks,
Hans

Thanks,
Rafael

Thanks for the review! Comments inline; new patch attached.

If we don't want to do that, I think we should try to add the new
constructors while keeping the old ones around, and then delete the
old constructors once clang is updated.

Yes, this would probably be the best. To make clang build I just
hacked the constructors in the patch I posted, but they should really
just forward to the new ones.

Right. Doing that.

*) Please name the most general model GeneralDynamicTLSMode. Elf's
default visibility being called 'default' is already confusing enough
:slight_smile:

I was thinking that calling it GeneralDynamicTLSMode wouldn't make
sense for non-ELF targets. My thinking was that DefaultTLSModel would
mean "use the default model for the target, which for ELF is general
dynamic, and on Darwin is whatever Darwin does, etc.".

This is a good point, but it applies to the other names too, right?
For Darwin, all 4 models map to the one that Darwin has. The two
closest examples to this in llvm are linkage and visibility. For
linkage we have our own names and a superset of any object format. For
visibility we use the ELF visibility names and other targets map it to
what they have (macho has only default and hidden I think).

It should at least be self consistent I think. I am ok with 4 llvm
specific names or using the 4 names used by the tlf.pdf document. Just
make sure none of them is named 'default' :slight_smile:

OK, let's go with GeneralDynamicTLSModel then.

+ separated copy of the variable). Optionally, a suggested TLS model may be

Not sure I would call it "suggested". What it is is a promise by the
FE/user that some restrictions apply to how the variable is used:

* localdynamic: Only used from this DSO.
* initialexec: Will not be loaded dynamically.
* localexec: Will be in the executable and is only used from it.

The restrictions should be documented too.

I'm not sure how much detail we should go into here, because the
restrictions might vary depending on the environment. For example,
with glibc, it will be possible to use initial-exec in a .so that will
be loaded dynamically, given that the amount of tls memory is small
enough, even though this doesn't work in general.

I think it's better if we can have the docs say that these correspond
to the TLS models as described in tls.pdf. I've tried to update the
text to do that.

+ bool isThreadLocal() const { return threadLocalMode; }

Add a != NotThreadLocal to make it explicit.

Done.

+ default: // Map unknown non-zero value to default.

Why?

Lots of other functions in the file do this, for example
GetDecodedLinkage and GetDecodedVisibility.

+/// Get the IR-specified TLS model for GV, or GeneralDynamic if no model
+/// was selected.

This comment is out of date, no?

Updated.

+ return TLSModel::GeneralDynamic;

And this return is dead, you can use llvm_unreachable.

It's not dead when GV isn't a GlobalVariable. (I'm not sure that can
happen, though?)

Thanks for the tests. Can you please add

* One running llvm-dis

Done.

* One targeting Darwin? Just extending X86/tls-models.ll would be fine.

Done.

Thanks,
Hans

tls_models3.diff (43.6 KB)

OK, let's go with GeneralDynamicTLSModel then.

OK

The restrictions should be documented too.

I'm not sure how much detail we should go into here, because the
restrictions might vary depending on the environment. For example,
with glibc, it will be possible to use initial-exec in a .so that will
be loaded dynamically, given that the amount of tls memory is small
enough, even though this doesn't work in general.

I think it's better if we can have the docs say that these correspond
to the TLS models as described in tls.pdf. I've tried to update the
text to do that.

It is probably better to be more strict than any particular
implementation. The description on the IL definition is all the
optimizers have to play with. For example, documenting the possibility
of having a small amount of initial-exec in a dso that is dlopend
would be a bad idea, as it would prevent the compiler from lowering a
variable to initial-exec as that might go over the limit.

Something high level like:

* localdynamic: Only used from this DSO.
* initialexec: Will not be loaded dynamically.
* localexec: Will be in the executable and is only used from it.

is probably OK.

+ default: // Map unknown non-zero value to default.

Why?

Lots of other functions in the file do this, for example
GetDecodedLinkage and GetDecodedVisibility.

That is peculiar, but you are right, it is better be consistent.

+ return TLSModel::GeneralDynamic;

And this return is dead, you can use llvm_unreachable.

It's not dead when GV isn't a GlobalVariable. (I'm not sure that can
happen, though?)

Good point. Looks like it can also be an alias, but we were handling
that only on X86 :frowning: I have fixed it.

LGTM with the models documented.

Thanks,
Hans

Thanks,
Rafael

It is probably better to be more strict than any particular
implementation. The description on the IL definition is all the
optimizers have to play with. For example, documenting the possibility
of having a small amount of initial-exec in a dso that is dlopend
would be a bad idea, as it would prevent the compiler from lowering a
variable to initial-exec as that might go over the limit.

Something high level like:

* localdynamic: Only used from this DSO.
* initialexec: Will not be loaded dynamically.
* localexec: Will be in the executable and is only used from it.

is probably OK.

I've put that in the doc. We can tweak it more after commit if necessary.

And this return is dead, you can use llvm_unreachable.

It's not dead when GV isn't a GlobalVariable. (I'm not sure that can
happen, though?)

Good point. Looks like it can also be an alias, but we were handling
that only on X86 :frowning: I have fixed it.

Thanks!

LGTM with the models documented.

Great! Committed r159077.

Thanks again for reviewing this.

- Hans

Great! Committed r159077.

Thanks again for reviewing this.

NP, sorry for taking so long.

- Hans

Cheers,
Rafael