[RFC] Could we mark coroutines as unreleased now?

One years before I sent ⚙ D115692 [docs] Mark the support for coroutine as unreleased to mark ‘coroutines’ as unreleased. But according to the discussion, we feel it might not be the time since there is a severe bug that we can’t resume coroutines in a different thread. But now the bug has been fixed. So I am wondering if we can mark ‘coroutines’ as unreleased now.

Note that there are still coroutine bugs. But I don’t feel it is a blocking issue. On the one hand, almost all the bugs are related to optimizations or sanitizers. I didn’t recall any existing issues related to semantics. On the other hand, GCC has coroutines bugs too but GCC had marked ‘coroutines’ as completed for a long time ago.

(One of the motivation for this is that I heard people talking that Clang is behind from GCC now since Clang haven’t complete the big four C++20 features. And their proof is the table in the web)

CC people who got involved or used coroutines: @jyknight @bcardosolopes @apolloww @iains
(I meant to CC Lewis Baker but he is not here. I’ll try to ping him somewhere else)

CC @AaronBallman since this relates to the policy to mark a language feature as complete.

2 Likes

Thank you for raising the question!

On our cxx_status page, we list coroutines as partial. Part of the reasoning given is “This feature requires further analysis of the C++ Standard to determine what work is necessary for conformance.” – do you have confidence that this analysis has been done and that we have sufficient test coverage to convince you that we’re conforming?

More broadly related to policy, I don’t think it’s practical to wait until a feature has no open issues before claiming we support the feature. However, the feature should be reasonably functional on all the targets we support (at least, the targets for which any given feature makes sense for). So an issue like wrong code due to incompatibility between C++20 coroutines and 32-bit Windows x86 ABI · Issue #59382 · llvm/llvm-project · GitHub does concern me – we support Windows and clang gets integrated into Visual Studio, so it’s not an uncommon target. So the final comment about coroutines having significant issues on Windows makes me think we’re not quite there yet for marking support as full. That said, perhaps it would be appropriate to update the Partial status to say it’s supported fully everywhere but on Windows targets?

3 Likes

Actually, I have no strong opinion here - missing features and/or target support do seem relevant.

I can only really comment from the perspective of the GCC impl. For the record, I think that is complete across the supported targets (I’ve even seen questions from folks have been using them in very small cpu embedded cases). However, of course, unfixed bugs exist.

I know we (all) have outstanding questions about the intent of coroutine continuations and heap allocation elision - but neither of those is specifically part of the C++20/23 specs.

2 Likes

“This feature requires further analysis of the C++ Standard to determine what work is necessary for conformance.” – do you have confidence that this analysis has been done and that we have sufficient test coverage to convince you that we’re conforming?

Personally, I think so. From the users’s perspective, Lee Howes from Meta said:

I think we are broadly supportive, we use coroutines in critical situations with pretty high confidence, though it’s not completely bug free.

However, the feature should be reasonably functional on all the targets we support (at least, the targets for which any given feature makes sense for). So an issue like wrong code due to incompatibility between C++20 coroutines and 32-bit Windows x86 ABI · Issue #59382 · llvm/llvm-project · GitHub does concern me – we support Windows and clang gets integrated into Visual Studio, so it’s not an uncommon target. So the final comment about coroutines having significant issues on Windows makes me think we’re not quite there yet for marking support as full. That said, perhaps it would be appropriate to update the Partial status to say it’s supported fully everywhere but on Windows targets?

My bad. I nearly forgot this since I haven’t work on Windows for a long time. So it is not good to mark coroutines as full now definitely. But I think it will be much helpful to say “it is not supported well on Windows” in the page.

3 Likes

No worries, there’s a lot of moving parts! Let’s update the status page for now so that the details are more accurate.

In terms of getting things working on Windows, perhaps we can ask for some help from @Gor_Nishanov1? He’s well-versed in the feature and has access to a lot of Windows-related expertise, so he (or someone he knows/works with at Microsoft) might be able to help debug some of these last few issues to get coroutines over the finish line.

2 Likes

My take is similar to Lee’s - despite known optimizer bugs and whatnots, we’ve been extensively using it in production for some years now, so from our perspective it’s effectively “unreleased”. The argument made by Aaron about Windows feels important though.

2 Likes

Is this open bug about non-guaranteed symmetric transfer on all platforms (in particular arm8) still broken? That seems like a significant blocker to calling the feature ready, rather than just being a “run of the mill” codegen bug. Symmetric transfer is now considered a best practice when designing most types of coroutines, rather than being an advanced feature that will be rarely used.

There are two factors that IMO make this bug more severe than many since it is likely to slip through testing and make it into production as a potentially rare and hard to debug crash. 1) This is a platform specific bug so anyone who tests on one platform and deploys to another won’t catch it. 2) This will often be used with variable amounts of transfers, eg, one transfer per RPC done over a connection. If your testing uses short-lived connections then you may not have enough symmetric transfers to blow out the stack, and will only experience the crash in production with a sufficiently long-lived connection. Similarly, the number of transfers could be timing or data dependent.

For context: We have been monitoring the bug situation and have been sadly avoiding using coroutines in production so far because our anticipated usage patterns would trigger show-stopper bugs. The recent change in LLVM16 to avoid caching errno addresses and other readnone calls across suspend points addressed our biggest concern, so we are now reevaluating if coroutines are usable for us at this point.

Is this open bug about non-guaranteed symmetric transfer on all platforms (in particular arm8) still broken?

That bug is not closed because it only has a godbolt reproducer until yesterday. We have AArch64 environment in the downstream and it shows that the symmetric transfer works in AArch64. Besides AArch64, I know that symmetric transfer doesn’t work in WebAssembly but it is not a usual hardware target. Also I remember the symmetric transfer doesn’t work in a unusual hardware target. And this is all I know so I think this may not be a blocker.

BTW, I didn’t look at some pretty old bug reports without a reduced reproducer. So (although it is bad) some still open issue report may not mean it is not fixed.

so we are now reevaluating if coroutines are usable for us at this point.

Great to hear that!