[coro] pre-split handling of the 'suspend' edge

This came up in PR #71263.

IIUC (I’m still ramping up on coroutines), the following pattern that appears in presplit coroutines:

%t = call i8 @llvm.coro.suspend(...)
switch i8 %t, label %suspend [
   i8 0, label %resume
   i8 1, label %destroy
]

really means that in the “default” (@llvm.coro.suspend “returns” -1) case, we must return immediately. The %suspend BB should have just a call to @llvm.coro.end and return. By “should” I mean that coro-aware passes like coro-split expect that.

The problem is that a naive (coro-unaware) pass or analysis could see the edge to %suspend, or the %suspend basic block, as a regular edge or basic block and do things like split the edge or insert more instructions in the basic block. In the PR example above, the edge could naively be seen as a loop exit edge (but it arguably isn’t. The loop is spread over multiple coro suspends and resumes).

@ChuanqiXu pointed out there were other cases where passes incorrectly handled the suspend intrinsic.

I wanted to start a discussion on what we could do here. For example, we could extend the IR Verifier and (maybe) make CoroSplit treat as an error invalid %suspend BBs? Any other ideas?

Could you clarify exactly what construct you can’t handle? Is the problem repairable by a later IR pass?

Is there some way we could extend existing constructs to address the issue? For example, if we “invoke” or “callbr” the suspend intrinsic, instead of "call"ing the intrinsic and branching on its result, it becomes impossible to insert anything between the intrinsic and the branch. (Not sure if this solves the issue…?)

An example is given in this PR. I don’t think repair is a way to go, because the main disconnect (IIUC) is that really nothing is supposed to happen in the suspend case, other than a return. So it’s a modelling artifact that there’s an “edge” there.

(I haven’t addressed your other point, perhaps @ChuanqiXu or others could comment)

A possible solution for this may be something similar with [Coroutines] Support `@llvm.coro.await_suspend` actually · Issue #64945 · llvm/llvm-project · GitHub (which tries to fix the issue that we may insert codes into the suspending process, which is not allowed by the language too) to introduce a new IR (a new terminator instruction) to represent the whole suspend semantics:

e.g.,

coro_call_br @llvm.coro.await_suspend(ptr %awaiter, ptr %handle,
                                                                 ptr @awaitSuspendFn) 
                                                                 label %suspend,
                                                                 label %resume,
                                                                 label %destroy

The original callbr may not be usable since the document says it can only be used to implement goto statements.

The original callbr may not be usable since the document says it can only be used to implement goto statements.

That’s the current usage, but it could be easily extended to allow intrinsics if we have a use for that. 99% of the code handling callbr doesn’t care what we’re calling.


What would happen if we defined a version of llvm.coro.suspend where the suspend edge just doesn’t exist? If the only thing that can happen on the “suspend” codepath is that we immediately return, there isn’t any obvious reason it needs to exist explicitly in the IR; we can synthesize it later, I think?

Then it can solve the issue raised. We (coroutine passes) can still recognize the suspend by the new introduced llvm.coro.await_suspend intrinsic.

What would happen if we defined a version of llvm.coro.suspend where the suspend edge just doesn’t exist? If the only thing that can happen on the “suspend” codepath is that we immediately return, there isn’t any obvious reason it needs to exist explicitly in the IR; we can synthesize it later, I think?

It sounds a good idea too. In my mind, it should be fine as long as we store the %suspend block in the earlier place in the coroutines so that we can still find it when we perform coroutine splitting.

What would we need the %suspend for though - wouldn’t it effectively amount to a ret?

I like the idea of using callbr for this. Eliminating insertion points that don’t actually exist seems like an improvement in the modelling of the program.

What would we need the %suspend for though - wouldn’t it effectively amount to a ret ?

Yes, currently, it will be lowered to a ret. But from the perspective of IR, the %suspend now contains a call to @llvm.coro.end(...) too to mark the end of the coroutine. Although it is actually a placeholder now, I think it is better to reserve the semantics.

Can’t callbr @llvm.coro.suspend include llvm.coro.end’s semantics, and still keep the ability to call @llvm.coro.end explicitly?

That should be possible too. My concern is that we may refactoring more codes in this way… especially coroutines are relatively stable nowadays.

In my mind, it may easier to introduce a new destination for callbr. Note that we still need to extend/refactor callbr to support intrinsics.

IIUC keeping a suspend BB would keep the need to potentially reason about it differently in code that doesn’t otherwise care about coroutines (like the case is with PGO instrumentation), both now and in the future. Special handling callbr llvm.coro.suspend would only affect coroutine lowering, unless I’m missing something?

but currently (and in the past), if I remember correctly, there is no need to understand the suspend BB for any other passes.

Also we have to keep the suspend BB after all. Let me talk from the frontend perspective. In the frontend perspective, the suspend BB we’re talking about is called coro.ret BB. Then for a simple C++ codes:

Task coro() {
     co_await some_thing();
}

The generated code will be something like:

Task coro() {
    auto init_awaiter = Task::promise::init_suspend();
    if (!init_awaiter.await_ready()) {
         init_awaiter.await_suspend();
         i2 %suspend = call @llvm.coro.suspend();
         switch i2 %suspend , label %coro.end [...]
    }

    ...
    
    auto some_awaiter = some_thing();
    if (!some_awaiter.await_ready()) {
         some_awaiter.await_suspend();
         i2 %suspend = call @llvm.coro.suspend();
         switch i2 %suspend , label %coro.end [...]
    }

    ...

    auto final_awaiter = Task::promise::final_suspend();
    if (!final_awaiter.await_ready()) {
         final_awaiter.await_suspend();
         i2 %suspend = call @llvm.coro.suspend();
         switch i2 %suspend , label %coro.end [...]
    }

coro.ret:
     call @llvm.coro.end();
     return void;
}

My point here is, technically, coro.ret BB is reachable without touching any @llvm.coro.suspend. In other words, we can’t get rid of coro.ret BB even if we treat callbr @llvm.coro.suspend as if it contains coro.ret BB.

So I make sure I understand: in your example, coro.ret is really coro.end, right?

Do I get this correctly: with the callbr alternative (and assuming coro.ret is really coro.end), your example would become:

Task coro() {
    auto init_awaiter = Task::promise::init_suspend();
    if (!init_awaiter.await_ready()) {
         init_awaiter.await_suspend();
         callbr @llvm.coro.suspend() to label %resume [label %destroy]
    }

    ...
    
    auto some_awaiter = some_thing();
    if (!some_awaiter.await_ready()) {
         some_awaiter.await_suspend();
         callbr @llvm.coro.suspend() to label %resume [label %destroy];
    }

    ...

    auto final_awaiter = Task::promise::final_suspend();
    if (!final_awaiter.await_ready()) {
         final_awaiter.await_suspend();
         i2 %suspend = call @llvm.coro.suspend();
         callbr @llvm.coro.suspend() to label %resume [label %destroy];
    }

coro.end:
     call @llvm.coro.end();
     return void;
}

where the resume and destroy labels are the ones that would have been in your original example, too.

I’m not sure that callbr can solve the issue, since it would still be possible to sink an instruction from before callbr into the destinations of the callbr.

I don’t understand why there are branches from the suspend points to the return block in the pre-split representation. Shouldn’t a pre-split coroutine be considered temporarily-paused at the llvm.coro.suspend, until it either continues execution (%resume) or gets destroyed (%destroy)? What is the third option of branching to %suspend for?

It seems to me that removing that branch-edge would fix the issue without needing callbr. What would it break to do so?

In the post-split representation, of course you do have a return at each suspend, but that’s a different matter.

Yes. Also we need to store label %coro.end somewhere to avoid it to be eliminated. Also all the %resume and %destroy labels should refer to different blocks actually.

I’m not sure that callbr can solve the issue, since it would still be possible to sink an instruction from before callbr into the destinations of the callbr.

The optimizer may not be able to do so since the optimizer don’t know if @llvm.coro.suspend may change the state of the program.

What is the third option of branching to %suspend for?

Currently, the %suspend block (or coro.end block in the above example) will contain a call to @llvm.coro.end(). The call to @llvm.coro.end will be lowered to empty. But it can represent the end of the coroutine statically.

It should be possible to treat @llvm.coro.suspend() as if it is @llvm.coro.end() too. My concern is that we may refactoring more codes in this way… especially coroutines are relatively stable nowadays.

Won’t %coro.end be br-ed to on the fallthrough case, i.e. for instance the last if’s else branches to %coro.end, correct? I.e. IIUC there is regular control flow that will reference %coro.end.

It is possible and common that all the await_ready() function returns false all the time. Then after they get inlined, the %coro.end block won’t be referenced and it may be optimized out if the %coro.suspend block doesn’t refer to %coro.end explicitly.

Ah, ok; and if @llvm.coro.suspend() implied @llvm.coro.end(), that’d be fine, correct?

I struggle to understand how the callbr alternative where @llvm.coro.suspend() does not imply @llvm.coro.end() would look like, and how it’d be any different from today’s call + switch - i.e. how it would solve the problem we started the thread with: avoiding a non-coro aware CFG-modifying pass from knowing it needs to treat the suspend edge and basic block carefully?