[RFC] Coroutine and pthread_self

Xun_Li · November 18, 2020, 10:06pm

Hi,

I would like to propose a potential solution to a bug that involves
coroutine and pthread_self().

Description of the bug can be found in
47833 – [coroutines] Clang incorrectly caches result of pthread_self() across coroutine suspend-points. Below is a summary:
pthread_self() from glibc is defined with "__attribute__
((__const__))". The const attribute tells the compiler that it does
not read nor write any global state and hence always return the same
result. Hence in the following code:

auto x1 = pthread_self();
...
auto x2 = pthread_self();

the second call to pthread_self() can be optimized out. This has been
correct until coroutines. With coroutines, we can have code like this:

auto x1 = pthread_self();
co_await ...
auto x2 = pthread_self();

Now because of the co_await, the function can suspend and resume in a
different thread, in which case the second call to pthread_self()
should return a different result than the first one. Unfortunately
LLVM will still optimize out the second call in the case of
coroutines.

I tried to just nuke all value reuse whenever a coro.suspend is seen
in all CSE-related passes (⚙ D89711 [Coroutine] Prevent value reusing across coroutine suspensions in EarlyCSE and GVN), but it
doesn't seem scalable and it puts burden on pass writers. So I would
like to propose a new solution.

Proposed Solution:
First of all, we need to update the Clang front-end to special handle
the attributes of pthread_self function: replace the ConstAttr
attribute of pthread_self with a new attribute, say "ThreadConstAttr".
Next, in the emitted IR, functions with "ThreadConstAttr" will have a
new IR attribute, say "thread_readnone".
Finally, there are two possible sub-solutions to handle this new IR attribute:
a) We add a new Pass after CoroSplitPass that changes all the
"thread_readnone" attributes back to "readnone". This will allow it to
work properly prior to CoroSplit, and still provide a chance to do CSE
after CoroSplit. This approach is simplest to implement.
b) We never remove "thread_readnone". However, we teach memory alias
analysis to understand that functions with "thread_readnone" attribute
will only interfere with coro.suspend intrinsics but nothing else.
Hopefully this will still enable CSE. Not sure how feasible this is.

Does the above solution (esp (a)) sound reasonable? Any feedback is
appreciated. Thank you!

A related issue, which may require separate solutions, is that
coroutine also does not work properly with thread local storage. This
is because access to thread local storage in LLVM IR is simply a
reference. However the address to such reference can change after a
coro.suspend. This is not taken care of today.
In this thread I would like to focus on the issue with pthread_self
first, but it's good to have context regarding the thread local
storage issue when discussing solution space.

rnk · November 20, 2020, 3:41pm

This calls to mind the MSVC /GT option for fiber-safe TLS:
https://docs.microsoft.com/en-us/cpp/build/reference/gt-support-fiber-safe-thread-local-storage?view=msvc-160
It seems reasonable to implement something similar in LLVM to solve the problem of coroutines and TLS.

For pthread_self, instead of inventing a new attribute, would it be enough for clang to ignore the attribute when this new fiber-safe TLS option is enabled?

Xun_Li · November 20, 2020, 6:36pm

Reid,

Thanks for the suggestion. That's a good idea.
One concern would be, when this new fiber-safe TLS option is enabled,
pthread_self() will not be optimized even in functions where no
coroutine is used. Do you think that would be a blocker?

rnk · November 23, 2020, 8:39pm

I don’t think it would be a blocker. You can certainly construct a test case where pthread_self is in the critical path (imagine a macro that uses pthread_self to do some sort of TLS access), but it seems unlikely. If coroutines/fibers had always existed and pthread_self was not marked const, I don’t think we would be spending the time and effort to add a new optimization attribute to optimize it.

jyknight · November 23, 2020, 10:07pm

Special handling for pthread_self doesn’t seem reasonable – all other functions marked ‘attribute((const))’ or ‘readnone’ have the exact same problem.

As you suggest, the same issue occurs with TLS variables – in particular, a normal, non-coroutine function which returns the address of a TLS variable (e.g. __thread int t; int* foo() { return &t; }) is clearly “readnone”, yet, has a different value per thread, and therefore must not be moved across a coroutine suspend point.

I think this means that LLVM just needs to be taught that certain analyses are invalidated by a coroutine suspend point – or else entirely give up on certain analyses or optimizations on functions which contains any un-lowered coroutine suspend points.

Xun_Li · November 23, 2020, 11:20pm

James,

I made a partial attempt in ⚙ D89711 [Coroutine] Prevent value reusing across coroutine suspensions in EarlyCSE and GVN, there are
some discussions there.
But it seems that we will have to teach maybe a dozen analyses that
perform some form of CSE on function calls. And any future analyses
will need to be careful about this. Would this approach be too
fragile?

Joerg_Sonnenberger · November 24, 2020, 1:12pm

I disagree. I would strongly argue that attribute const for pthread_self
is a violation of the attribute contract. As seen, it rather obviously
depends on observable state of the program. It should be pure only.

Joerg

jyknight · November 24, 2020, 9:52pm

The contract for ‘attribute((const))’ as specified in GCC’s docs may be open to your interpretation – and in fact there has been a discussion on the GCC lists in the past about this topic: <https://gcc.gnu.org/legacy-ml/gcc/2015-09/msg00354.html>. But I think it’s far too late to change to the semantics you propose. The actual semantics – both as implemented and as depended upon in practice – are that calls to an __attribute__((const)) function can be arbitrarily moved, added, or removed within one thread of computation, but not globally across the entire program.

In any case, what it lowers to in Clang is the LLVM IR function-attribute readnone, which I’d argue is even more clearly correct to use on an LLVM function returning the address of a TLS global variable. Note that llvm infers that property via the FunctionAttrs pass, e.g. for this function:
@t = thread_local global i32 0, align 4
define i32* @foo() {
ret i32* @t
}

Joerg_Sonnenberger · November 24, 2020, 11:06pm

I don't see how that is a valid transformation under the definition in
the language reference either. I also don't believe that this is the
only situation it can happen, I would expect OpenMP to expose similar
issues.

Joerg

nhaehnle · November 25, 2020, 3:13pm

> In any case, what it lowers to in Clang is the LLVM IR function-attribute
> readnone, which I'd argue is even more clearly correct to use on an LLVM
> function returning the address of a TLS global variable. Note that llvm
> infers that property via the FunctionAttrs pass, e.g. for this function:
> @t = thread_local global i32 0, align 4
> define i32* @foo() {
> ret i32* @t
> }

I don't see how that is a valid transformation under the definition in
the language reference either. I also don't believe that this is the
only situation it can happen, I would expect OpenMP to expose similar
issues.

I tend to agree with this. The LangRef definition of readnone is:

"On a function, this attribute indicates that the function computes
its result (or decides to unwind an exception) based strictly on its
arguments, without dereferencing any pointer arguments or otherwise
accessing any mutable state (e.g. memory, control registers, etc)
visible to caller functions."

The tricky part is that the "thread self identity" is a piece of state
that used to be immutable, but with the introduction of coroutines, it
has become mutable. The example @foo reads the "thread self identity"
piece of state. As long as that state was immutable, it was correct to
mark the function as readnone. Now that the state has become mutable,
this is no longer correct.

That may seem annoying, but I suspect it's best to just swallow that
pill and take it to its logical conclusion.

Doing so ends up treating pthread_self(), @foo, etc. more
conservatively, but correctly. Treating them correctly and less
conservatively is really a kind of alias analysis problem. I would
hesitate to just add a new ad-hoc attribute for it, we already have so
many of those, and would prefer adding a mechanism for this kind of
thing generically.

We faced a similar problem in defining SPIR-V / Vulkan extensions for
raytracing, where part of a "thread's self identity" can change for
very similar reasons at "invocation repack instructions".[0] So just
like @llvm.coro.suspend can change the state read by pthread_self(),
which used to be immutable before coroutines were introduced,
invocation repack instructions can change pieces of state that used to
be immutable before raytracing pipelines were introduced.[1]

It seems to me that what we have here are different kinds of
non-addressable pieces of state which most code doesn't write to, but
some code may read from. Perhaps there should be a parameterized
readnone(x) attribute, where x identifies the kind of state, and a
corresponding writes(x) attribute, which indicates that a function may
write to the state identified by x? Or perhaps we should just say that
pthread_self() and similar should be inaccessiblememonly, but that
doesn't allow us to capture the fact that most code doesn't _write_ to
the relevant "inaccessible memory"...

Cheers,
Nicolai

[0] You can search for that term in a recent version of the Vulkan
spec that includes KHR extensions, e.g.
Vulkan® 1.2.270 - A Specification (with KHR extensions).

[1] To make matters more fun, Vulkan has an analog of thread_local
variables (the Private storage class). Pointers are kind of weird in
Vulkan, but assuming that will change eventually and we will be able
to take pointers of such variables generically, then I believe unlike
@llvm.coro.suspend, invocation repack instructions would not affect
the values of such pointers. So there are really two different kinds
of hidden state in play here.

jyknight · November 25, 2020, 11:50pm

In any case, what it lowers to in Clang is the LLVM IR function-attribute
readnone, which I’d argue is even more clearly correct to use on an LLVM
function returning the address of a TLS global variable. Note that llvm
infers that property via the FunctionAttrs pass, e.g. for this function:
@t = thread_local global i32 0, align 4
define i32* @foo() {
ret i32* @t
}

I don’t see how that is a valid transformation under the definition in
the language reference either. I also don’t believe that this is the
only situation it can happen, I would expect OpenMP to expose similar
issues.

Clang/LLVM OpenMP support doesn’t expose similar issues, because during the design work in 2012, the explicit choice was made to outline the parallel regions to a separate function up-front. It does not expose an llvm IR intrinsic which switches to another (or multiple!) thread of execution. The ultimately-unsuccessful competing proposals did propose that, and were rejected because of it.

That choice greatly reduced the ability to optimize OpenMP parallel loops (e.g., no LICM, even where it’d be correct!). But, this was done precisely in order to avoid having to teach all the LLVM optimization passes about the rules for optimizing a function body running on multiple threads simultaneously.

The tricky part is that the “thread self identity” is a piece of state
that used to be immutable, but with the introduction of coroutines, it
has become mutable. The example @foo reads the “thread self identity”
piece of state. As long as that state was immutable, it was correct to
mark the function as readnone. Now that the state has become mutable,
this is no longer correct.

That may seem annoying, but I suspect it’s best to just swallow that
pill and take it to its logical conclusion.

Doing so ends up treating pthread_self(), @foo, etc. more
conservatively, but correctly. Treating them correctly and less
conservatively is really a kind of alias analysis problem. I would
hesitate to just add a new ad-hoc attribute for it, we already have so
many of those, and would prefer adding a mechanism for this kind of
thing generically.

While it is certainly important to correctly model the fiction that the coroutine intrinsics present, I do think it’s important to remember that they are a fiction within the compiler. Yes, in the high-level coroutine IR, before CoroSplit runs, thread identity acts as if it were mutable across the “llvm.coro.suspend” intrinsic. However, the suspend intrinsic is simply a placeholder for the transform which will ultimately occur, to turn the entire function body inside-out. This is an important distinction, because there remains no way to “secretly” mutate the thread identity. E.g. calling a function cannot result in the thread identity of the caller being mutated.

That we’ve introduced this set of coroutine intrinsics (and the CoroSplit pass to flip a function inside out) does not imply that we ought to change the meaning of ‘readnone’ retroactively to mean that it needs to be invariant on thread-identity. We should continue to model it as meaning “doesn’t read mutable memory or mutable state, but may depend on the thread identity”, which is what the meaning has been up until now, even if not explicitly stated. Given the existence of an intrinsic which “changes” the thread identity, we should, now, clarify the documentation.

We also do need to fix any optimization passes that run prior to the CoroSplit pass understand the effect that a coroutine suspend-point has on thread-identity. Note that we need to do much of the same work anyways, to fix access to TLS variables within a coroutine. (Or, as an alternative, we could move the CoroSplit pass first in the optimization pipeline, so that there are no such optimization passes which run before it. This would fix the correctness issues, but greatly reduce the potential for optimization – similar to OpenMP)

Finally, it turns out to be important, it may be reasonable to add a new function attribute to indicate “doesn’t depend on current thread identity”, which could unlock additional optimization potential even across a suspend-point. But, I would not suggest actually adding it, unless a real need has been identified and justified.

nhaehnle · November 26, 2020, 7:37am

The tricky part is that the "thread self identity" is a piece of state
that used to be immutable, but with the introduction of coroutines, it
has become mutable. The example @foo reads the "thread self identity"
piece of state. As long as that state was immutable, it was correct to
mark the function as readnone. Now that the state has become mutable,
this is no longer correct.

That may seem annoying, but I suspect it's best to just swallow that
pill and take it to its logical conclusion.

Doing so ends up treating pthread_self(), @foo, etc. more
conservatively, but correctly. Treating them correctly and less
conservatively is really a kind of alias analysis problem. I would
hesitate to just add a new ad-hoc attribute for it, we already have so
many of those, and would prefer adding a mechanism for this kind of
thing generically.

While it is certainly important to correctly model the fiction that the coroutine intrinsics present, I do think it's important to remember that they are a fiction within the compiler. Yes, in the high-level coroutine IR, before CoroSplit runs, thread identity acts as if it were mutable across the "llvm.coro.suspend" intrinsic. However, the suspend intrinsic is simply a placeholder for the transform which will ultimately occur, to turn the entire function body inside-out. This is an important distinction, because there remains no way to "secretly" mutate the thread identity. E.g. calling a function cannot result in the thread identity of the caller being mutated.

This is an important property in so far as it means we won't need a
`does_not_change_thread_identity` function attribute. We can instead
just hardcode that the only instruction which may change it is the
llvm.coro.suspend intrinsic, and that is really helpful. However, I
don't see how it affects the discussion around the meaning of
`readnone`.

That we've introduced this set of coroutine intrinsics (and the CoroSplit pass to flip a function inside out) does not imply that we ought to change the meaning of 'readnone' retroactively to mean that it needs to be invariant on thread-identity. We should continue to model it as meaning "doesn't read mutable memory or mutable state, but may depend on the thread identity", which is what the meaning has been up until now, even if not explicitly stated. Given the existence of an intrinsic which "changes" the thread identity, we should, now, clarify the documentation.

We also do need to fix any optimization passes that run prior to the CoroSplit pass understand the effect that a coroutine suspend-point has on thread-identity. Note that we need to do much of the same work anyways, to fix access to TLS variables within a coroutine. (Or, as an alternative, we could move the CoroSplit pass first in the optimization pipeline, so that there are no such optimization passes which run before it. This would fix the correctness issues, but greatly reduce the potential for optimization -- similar to OpenMP)

It seems to me that all of us here are trying to claim that we don't
want to change the meaning of "readnone", and yet we disagree on what
that implies. I would argue that the interpretation outlined by Joerg
and me does not require a _normative_ change to the LangRef text,
although adding a note to clarify the situation wrt coroutines would
of course help.

In practice, the main reason I want "readnone" to mean "does not
depend on thread identity" is that names matter and I don't want us to
design a long-term risk into the LangRef. The attribute is called
"readnone", not "readalmostnone". I expect that the majority of LLVM
developers don't usually consider coroutines in their work, which
means that redefining "readnone" as "there's this one piece of mutable
state which a readnone function _is_ allowed to read" is going to lead
to bugs. (Of course, renaming the attribute would address that
concern, but involves massive churn in the codebase.)

Another factor is comparing impacts on optimizations in practice. I
would expect:

1. "readnone" _may_ depend on thread identity: all functions that are
readnone today can remain readnone; _none_ of them can be optimized
across coroutine suspend points.

2. "readnone" may _not_ depend on thread identity: very few functions
that are readnone today must remove the attribute; all remaining
readnone functions (i.e., the vast majority of them) can still be
optimized across coroutine suspend points.

I acknowledge that for the C++ frontend, the situation with
__attribute__((const)) is a concern. The long-term risk of bad names
seems to me quite similar, i.e., arguably the fact that pthread_self()
is declared __attribute__((const)) is a bug in pthreads if pthreads is
meant to be compiled as C++20. C++20 is young enough that such an
argument may still win out, but I acknowledge that ecosystem
considerations may lead you to a different conclusion. For C++, the
harsh reality of code in the wild using __attribute__((const)) is
relevant, and that may lead you to define __attribute__((const)) as
"may depend on mutable thread identity". I understand that and I don't
have a stake in what's decided for C++.

By contrast, we don't have to worry about backwards compatibility with
external code in LLVM IR, and C++ is only one of many frontends. A
clean design for what makes sense within the internal logic of LLVM IR
should be higher priority.

Cheers,
Nicolai

Xun_Li · November 30, 2020, 6:06pm

To fix TLS variables in coroutines, we need a bit more than the fix
for the pthread_self issue.
For example, assuming we have a TLS variable and a function that looks
like this:

@tls_variable = internal thread_local global i32 0, align 4

define i32* @foo(){
ret i32* @tls_variable
}

Even if @foo does not have a "readnone" attribute, it will still lead
to issues during inlining in coroutines because @tls_variable is just
a value like any other value. For callers looks like this:

%tls1 = call i32* @foo()
..coro.suspend..
%tls2 = call i32* @foo()
%cond = icmp eq i32* %tls1, %tls2
...

When inlining happens, all uses of %tls1 and %tls2 will simply be
replaced with @tls_variable.

To fix that we might have to introduce a new intrinsics for accessing
TLS variables such that they don't get replaced directly. Such
intrinsics function will be similar to pthread_self: it depends on
thread identity but nothing else.
And then from there we can rely on CSE to properly handle the optimizations.

MaskRay · December 1, 2020, 11:31pm

The contract for 'attribute((const))' as specified in GCC's docs may be
open to your interpretation -- and in fact there has been a discussion on
the GCC lists in the past about this topic: <
https://gcc.gnu.org/legacy-ml/gcc/2015-09/msg00354.html>\. But I think it's
far too late to change to the semantics you propose. The actual semantics
-- both as implemented and as depended upon in practice -- are that calls
to an `__attribute__((const))` function can be arbitrarily moved, added, or
removed within one thread of computation, but *not* globally across the
entire program.

In any case, what it lowers to in Clang is the LLVM IR function-attribute
readnone, which I'd argue is even more clearly correct to use on an LLVM
function returning the address of a TLS global variable. Note that llvm
infers that property via the FunctionAttrs pass, e.g. for this function:
@t = thread_local global i32 0, align 4
define i32* @foo() {
ret i32* @t
}

According to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66512
and Szabolcs Nagy - Re: Clarifying attribute-const , in GCC there are
2 issues related to __attribute__((const)) (used by glibc and musl for pthread_self)

* whether 'const' should imply 'nothrow' (in clang, we set LLVM 'readnone' & 'nounind' attributes, but it seems 66512 – PRE fails to optimize calls to pure functions in C++, ok in C has a different interpretation)
* whether 'const' is allowed to read a thread-local const variable.

Xun_Li · December 4, 2020, 4:54pm

Hi,

To help with the discussion, I drafted two patches:
https://reviews.llvm.org/D92662 to address the pthread_self issue by
dropping the readnone attribute;
https://reviews.llvm.org/D92661 to address the TLS issue by turning
TLS access into an intrinsics.
Please take a look and let me know what you think and whether this is
along the right direction.
Thanks!

nhaehnle · December 9, 2020, 3:53pm

Hi,

The direction of adding an intrinsic for “taking the address of” a thread_local variable in https://reviews.llvm.org/D92661 seems reasonable to me. This new instruction “materializes the constant” as it were.

However, the patch does require updates to the LangRef.

Cheers,
Nicolai

jyknight · December 10, 2020, 8:00pm

I view what happened is that we’ve introduced effectively a new form of IR (pre-coroutine-split IR), yet when doing so, we failed to realize the impact that a “suspend” allowing “thread identity” to change would have on the rest of LLVM. So, oops! And now we need to design what things really should be looking like here going forward.

Let me try to make the two proposals so far more concrete.

Assumptions:

We’re not going to move CoroSplit early in the pipeline, so this issue really does need to be solved.
We should not change the meaning of the C attribute((const)) to prohibit reading the thread-identity, but, we can change what LLVM IR that transforms into.
As a prerequisite, we have already solved the TLS-address-is-a-constant issue – e.g. by moving TLS address lookup to an intrinsic (I have some other comments about that issue, separately).

Proposal 1:

The readnone attribute is redefined to prohibit depending on thread id.
Create a new attribute “readthreadidonly”, which is similar to readnone, except that it may access the current thread identity.
The new @llvm.threadlocaladdr intrinsic (or whatever it ends up as) is given this attribute.
Clang’s attribute((const)) emits this attribute, instead of readnone.
Any other frontends which emit the “readnone” attribute may also need to be updated and emit readthreadidonly instead of readonly.
Optimization passes that are oblivious to the new readthreadidonly will fail correct – they won’t misoptimize by moving such a call across an llvm.coro.suspend call (good) – but they also will regress performance of straightline code (bad).
To avoid regressing performance, we’ll need to update all optimization passes which currently handle readnone, to also handle readthreadidonly similarly.
Option 1: (quick’n’dirty) treat readthreadidonly as readnone (allowing arbitrary movement) on any function which doesn’t contain an llvm.coro.suspend, and treat readthreadidonly as inaccessiblememonly if there are any suspends.
Option 2: actually understand that readthreadid can be moved anywhere except across an llvm.coro.suspend call.

Proposal 2:

The readnone attribute (continues to) allow depending on thread id.
C frontend’s attribute((const)) continues to emit the readnone attribute.
The new @llvm.threadlocaladdr intrinsic is marked readnone.
(Optional) add a not_threadid_dependent attribute, which says that a function doesn’t care which thread it’s invoked on.
Neither clang nor other frontends need to change which IR attributes they emit.
Optimization passes must be taught not to move readnone calls across llvm.coro.suspend, or they will be incorrect.
Option 1: (quick’n’dirty) treat ‘readnone’ as ‘inaccessiblememonly’ in any function which contains a llvm.coro.suspend.
Option 2: actually understand that readnone can move anywhere in the function except across an llvm.coro.suspend call (unless not_threadid_dependent is present, in which case it can also move across that).

I’m still slightly in favor of proposal 2, but either seems OK. Both seem like they’ll be effectively the same amount of work to implement in LLVM. I feel like having the “not_threadid_dependent” attribute be separate from “readnone” (doesn’t access memory) is logical and consistent – and having them be distinct may potentially also be useful in other circumstances.

rnk · December 10, 2020, 11:20pm

I kind of like proposal 2, option 1, quick’n’dirty. Consider that Instruction::mayReadFromMemory could be taught to check if its parent function is a coroutine, and power down in those cases. Then it’s a matter of ensuring that all code motion passes use the central utility.

nhaehnle · December 14, 2020, 5:00pm

Thank you for the summary, James.

I view what happened is that we’ve introduced effectively a new form of IR (pre-coroutine-split IR), yet when doing so, we failed to realize the impact that a “suspend” allowing “thread identity” to change would have on the rest of LLVM. So, oops! And now we need to design what things really should be looking like here going forward.

Let me try to make the two proposals so far more concrete.

Assumptions:

We’re not going to move CoroSplit early in the pipeline, so this issue really does need to be solved.

We should not change the meaning of the C attribute((const)) to prohibit reading the thread-identity, but, we can change what LLVM IR that transforms into.

As a prerequisite, we have already solved the TLS-address-is-a-constant issue – e.g. by moving TLS address lookup to an intrinsic (I have some other comments about that issue, separately).

Sounds right.

Proposal 1:

The readnone attribute is redefined to prohibit depending on thread id.

As I’ve explained before, I don’t think this is a redefinition. It is the context in which the definition is interpreted that has changed, not the definition itself.

Create a new attribute “readthreadidonly”, which is similar to readnone, except that it may access the current thread identity.

The new @llvm.threadlocaladdr intrinsic (or whatever it ends up as) is given this attribute.

Clang’s attribute((const)) emits this attribute, instead of readnone.

Any other frontends which emit the “readnone” attribute may also need to be updated and emit readthreadidonly instead of readonly.

Optimization passes that are oblivious to the new readthreadidonly will fail correct – they won’t misoptimize by moving such a call across an llvm.coro.suspend call (good) – but they also will regress performance of straightline code (bad).

To avoid regressing performance, we’ll need to update all optimization passes which currently handle readnone, to also handle readthreadidonly similarly.

Option 1: (quick’n’dirty) treat readthreadidonly as readnone (allowing arbitrary movement) on any function which doesn’t contain an llvm.coro.suspend, and treat readthreadidonly as inaccessiblememonly if there are any suspends.

Option 2: actually understand that readthreadid can be moved anywhere except across an llvm.coro.suspend call.

I wonder if there’s an:

Option 3: After (or during) coroutine splitting, upgrade readthreadidonly to readnone.

This is based on the assumption that after coroutine splitting, threadid is no longer mutable state.

Doing the upgrade any earlier is incorrect because of function inlining.

This is functionally a subset of what Option 1 is doing, but requires changes to fewer transforms.

Proposal 2:

The readnone attribute (continues to) allow depending on thread id.

C frontend’s attribute((const)) continues to emit the readnone attribute.

The new @llvm.threadlocaladdr intrinsic is marked readnone.

(Optional) add a not_threadid_dependent attribute, which says that a function doesn’t care which thread it’s invoked on.

Neither clang nor other frontends need to change which IR attributes they emit.

Optimization passes must be taught not to move readnone calls across llvm.coro.suspend, or they will be incorrect.

Option 1: (quick’n’dirty) treat ‘readnone’ as ‘inaccessiblememonly’ in any function which contains a llvm.coro.suspend.

Option 2: actually understand that readnone can move anywhere in the function except across an llvm.coro.suspend call (unless not_threadid_dependent is present, in which case it can also move across that).

I’m still slightly in favor of proposal 2, but either seems OK. Both seem like they’ll be effectively the same amount of work to implement in LLVM. I feel like having the “not_threadid_dependent” attribute be separate from “readnone” (doesn’t access memory) is logical and consistent – and having them be distinct may potentially also be useful in other circumstances.

If we compare the two proposals, they ultimately offer the same level of expressiveness:

p1 readnone == p2 readnone+not_threadid_dependent
p1 readthreadidonly == p2 readnone

p1 no attribute == p2 no attribute

… and so yes, it’s plausible that they’re effectively the same amount of work in the end.

The arguments against each of the proposals as far as I’m aware:

Downside of proposal 1: it introduces an attribute “readthreadidonly”, which seems unappealing, presumably because it feels like too much special-casing of threadid.

Downside of proposal 2: it changes the meaning of “read none” from “reads no mutable state” to “reads almost no mutable state”, which is unintuitive and invites bugs due to misunderstandings.

I agree that a “readthreadidonly” seems unappealing, but the downside of proposal 2 weighs far heavier to me. It adds permanent mental baggage that everybody working with LLVM IR will have to carry around with them for the indefinite future.

Perhaps there are ways to address the downside of proposal 1? For example, could we make a parameterized version of readnone that reads like readnoneexcept(threadid)? This could also fit nicely with a readnoneexcept(inaccessiblemem).

Cheers,
Nicolai

Xun_Li · December 14, 2020, 5:45pm

Re: "Option 3: After (or during) coroutine splitting, upgrade
readthreadidonly to readnone."

CoroSplit runs as part of CGSCC passes, so are all the CSE-like passes
(Early CSE, GVN and etc.).
The upgrade of attribute can only happen after all coroutines are
split, so it cannot be part of CGSCC passes.
Hence I suspect that the current optimization pipeline will not be
able to take advantage of the attribute upgrade at all?

Topic		Replies	Views
Address thread identification problems with coroutine IR & Optimizations	58	2503	June 21, 2024
[RFC] LLVM Coroutines LLVM Dev List Archives	9	119	June 11, 2016
Misoptimization of TLS and __attibute__((const)) in stackful coroutines (-ffunction-thread-migration) IR & Optimizations coroutines	9	386	July 25, 2024
RFC: Coroutine Optimization Passes LLVM Dev List Archives	8	97	July 18, 2016
Guidance requested: Fixing missed optimization for coroutines LLVM Dev List Archives	1	93	January 23, 2021

[RFC] Coroutine and pthread_self

Related topics