The motivation issue is here: [bug] clang miscompiles coroutine awaiter, moving write across a critical section · Issue #56301 · llvm/llvm-project · GitHub. But it is super long and complex to read. So I’ll try to give it a simple introduction on the level of middle end.
Background for coroutines
(This section is about some related background about C++20 coroutines and how LLVM transform them. People who know it can skip it.)
The C++20 Coroutines are stackless coroutines. It allows the programmer to split a function into several pieces. But semantically, these pieces still belongs to the same function so that they need to re-use the local variables. So C++20 coroutines need the compiler to construct a struct to maintain such variables.
The struct is called coroutine frame in Clang/LLVM.
In Clang/LLVM, we decide to do the job in the middle end so that we can control the variables living in coroutine frames in a finer-grained form. Since the variables in the frontend may be splitted/optimized in the middle end and not every local variables need to live in different pieces. Then we can get better performance.
coroutine_type coro_func() {
big_structure a = produce();
if (some_condition()) {
big_structure b = produce();
consume(b);
}
co_await something();
consume(a);
}
In this example, while it is valid to put both a
and b
to the coroutine frame, it is better to only put a
to the frame since b
is only used in one piece. So the middle end need to analysis the lifetime of values.
The problems
The coroutine frame is allocated by malloc and the address will be a local variable at the start of transforming. e.g.,
coroutine_type coro_func() {
void *frame = malloc(size);
big_structure a = produce();
if (some_condition()) {
big_structure b = produce();
consume(b);
}
co_await something();
consume(a);
}
Then the problem is that the compiler can’t know the relationship between local variables and the frame pointer. For the reported issue, the pattern will be:
void *frame = malloc(size);
...
bool b = init_value();
func(frame);
use of b ...
Then the AA feel frame pointer won’t alias the address of b
and the optimizer sinks the definition of b
.
void *frame = malloc(size);
...
func(frame);
bool b = init_value();
use of b ...
Then problem happens.
Possible solutions
There may be two solutions in my mind.
- Mark frame pointer as may_alias to all the local variables in the same coroutine before split.
- Extract the analysis part in current CoroSplit transformation pass to a separate analysis pass. Then create a new AA pass based on that information form that analysis pass.
The first solution is simple and quick to implement. But it may hurt the runtime performance. I know this is always almost what we did before when we met the issue with other optimization pass don’t understand coroutines. And many people had expressed they don’t feel it good.
The second solution looks good. And it has the potential to fix the previous workarounds. The downside is that it may require a lot of work. (This may be the reason that I delayed many times.) Also as far as I know, we don’t add a lot of analysis pass recently.
I am pretty hesitating about choosing a path to fix this. So I post here to ask your feelings on the issue.
Thanks,
Chuanqi
Update: Another concern I had for the first solution is whether or not the AA in LLVM is transitive. That said, if both the pointer a
and b
are may_alias with the frame pointer, does that imply the pointer a
would be considered to be may_alias with the pointer b
? From my reading to the code, the answer is no. But I want to be sure about it. CC @nikic