I recently looked at the C++ standard for coroutine use, and IMO the people who developed that standard did not understand what coroutines should be.
Subroutines are implemented with a stack of activation records. Coroutines do not generate an activation record, but they do have local data. Therefore, the C++20 standard has created a coroutine thing that should not exist IMO. It puts an activation record on the heap and suspends execution to go to another piece of code. I believe that is a guess at what coroutines were initially created to be due to an effort to implement what http://home.claranet.nl/users/mhx/Forth_Bell.pdf described. Bell’s description IMO was ment to impress his contemporaries but he did a poor job. Bell described the activation record in terms that were contemporary. Compilers were still in their adolescent years, and this might have been a reasonable attempt to describe coroutines in those contemporary terms.
In my opinion, we should go back and study coroutines in an application that actually uses them correctly before we try to reintroduce them. Moving Forth: Part 1 (see: The Threading Technique) This is a use of coroutines the way they were originally designed.
To describe coroutines to an LLVM savy audience, I would simply say a coroutine is subordinate to a subroutine (function). The IR for a subroutine contains Basic Blocks (BBs) that are separated by flow of control (FoC) actions. A Region of BBs always has one entry point. A coroutine is simply a BB or a BB Region subordinate to a subroutine. It has an entry point name and can jump on exit to any other coroutine that is subordinate to the same parent subroutine or it can exit the parent subroutine. The coroutine is “called” by a coroutine jump rather than a subroutine call. I am going to call the coroutine jump “goco” which means “goto coroutine.” Therefore, all BB Regions of the parent subroutine after the entry block could be coroutines. All F0C between coroutines using the goco instruction must be to coroutines subordinate of the same parent subroutine. Any use of a “return” statement will exit the coroutine and the parent subroutine.
The activation record is associated with the parent subroutine. The set of coroutines under that parent, all use the parent subourtine’s activation record. A coroutine is only visible to be called by the parent subroutine and coroutines within the set of couroutines under that parent subroutine.
Although coroutines do not usually have arguments, an argument list could be added to assign values like phi functions.
All of the above coroutine rules follow directly from the concept that a coroutine is a BB (or a Region of BBs) of a parent suboutine.
LLVM should provide a capability to create a set of coroutines within a subroutine that can mutually jump to other coroutines within the same set under the same parent subroutine.
I’m not sure I totally understand your description of how coroutine should work, in particular how they would be different than a “basic block”?
I don’t know if you saw this already, but @rjmccall presented on this topic a couple of years back at the LLVM Dev Conference: 2018 LLVM Developers’ Meeting: J. McCall “Coroutine Representations and ABIs in LLVM” - YouTube ; maybe you can formulate your thoughts around the use-cases and various tradeoffs John exposed?
Thanks for you quick reply.
McCall’s presentation treated coroutines as if they must have an activation record. My description above, does not require any activation record separate from the parent subroutine’s activation record. In other words the parent subroutine and all the coroutines in the set under that parent use the same activation record. Therefore most of what McCall was trying to work around does not need to exist at all.
As for being different from a Basic Block, coroutines can be compiled separately from their parent subroutine, but Basic Blocks cannot. The reference above to Moving Forth Threading Techniques is an example that allows code to be compiled JIT into an REPL style VM to be both compiled and tested interactively. Then when it is working to satisfaction, a release version of the application can be compiled and optimized down into an executable which if opened would appear to be one large function inside with hundreds of BBs and very little overhead.
Sorry, I didn’t put your ping in my response.
One additional point that is implied but should be made concrete, is that the set of coroutines associated with a given parent subroutine can grow JIT after the parent subroutine is compiled and linked and running. Any coroutines that are added can be entered into a hash table visible to the parent subroutine to be accessed indirectly during calls to the parent subroutine. That is how languages like FORTH which is the subject of the Moving Forth web page referenced above, operate in their Virtual Machines.
I would like to know if this style of coroutine is of interest to the LLVM community. It may need to be called by a different name since “coroutine” is already in use, maybe “bbcoroutine.”
If you’re interested in implementing this, I would start with an RFC on llvm-dev@ mailing list, going in more details in what it would solve that LLVM can’t solve now, so that it’d motivate the feature.
Then it’s a matter of discussing how to fit this into the LLVM infra (what kind changes are required to the IR and to the codegen?)
@mehdi_amini after hearing McCall’s presentation, I believe that the coroutines that I am describing are entirely different from what he describes. I believe that this feature would have to be entirely separate. That is why I commented in my prior post that a name change may be of value. So starting with this post, I am calling this feature bbcoroutines.
For Instance, the current coroutines need to be packaged in a “ramp function” that sets up the activation record separate from the stack. Then the significant part of the coroutine is put in a loop with a co_yeild and a routine with a co_await jumps to the coroutine until it is no longer needed. Then a clean-up function calls a co_return and the coroutine ramp function’s activation record is dismissed. All of that programmer written overhead is totally unnecessary. It is making the programmer subordinate to the compiler which is backward.
A bbcoroutine is just the significant part of the coroutine described above. It never has its own activation record. It is subordinate to the parent subroutine which provides the local data space in its own activation record. A compiled bbcoroutine exists in memory in the same way a function exists except it is never called. All access to the bbcoroutine is by jumping to the entry point. Almost all exits from the bbcoroutine are by jumping to another bbcoroutine in the same set under the same parent subroutine using the same activation record. When the last bbcoroutine executes that is necessary for the functionality of the parent subroutine the last bbcoroutine performs a return which causes the parent subroutine to pop its activation record and return to its caller. All of the local data for all of the coroutines in the set under that parent subroutine is popped with the activation record of that subroutine. It cannot get any simpler. A bbcoroutine is simply part of the parent subroutine in the same way a range of basic blocks is part of that subroutine, except for when it is compiled.
The language providing the bbcoroutine feature must have a keyword (eg. bbco) to specify that a subroutine (function) will have a set of bbcoroutines subordinate to it. The language will also need to have a jump to bbcoroutine keyword (eg. goco). Every bbcoroutine would have a profile like a subroutine profile with the parent subroutine name followed by a separator symbol followed by the bbcoroutine name followed by the argument list if necessary for the phi routine values.
When the parent subroutine is called, the activation record is created and an appropriate bbcoroutine is jumpped to using the goco keyword and the bbcoroutine name or address. When the bbcoroutine completes it jumps to the next bbcoroutine etc untill the return. If the parent subroutine has a return value, the bbcoroutines that contain return statements should return a value.
The LLVM IR will need some modification to allow separately compiled bbcoroutines to be treated like ranges of Basic Blocks. Once the bbcoroutines are compiled their addresses will all need to be linked in the same way that a normal subroutine address is handled. The terminal expressions of basic blocks will need to be augmented to include a jump to one of the bbcoroutines. In the IR this will need to be a jump to a function address similar to the function call, and it will only appear as a terminal statement of a basic block. (That will be true in both the parent subroutine and in bbcoroutines.) The activation record and the local and stack variable names are shared with the parent subroutine. I believe the execution mechanics will be straight forward.
@rjmccall John, I would appreciate your input on the topic I presented in this thread before I start promoting it to the whole llvm-dev community.
I think Mehdi is right. Let’s call your new concept a “gadget” so that we don’t come in with preconceptions of it. A gadget is a distinct set of basic blocks with a single entry point, which presumably dominates all the blocks in the gadget. You enter a gadget by branching to its entry point. You exit the gadget by either branching to the start of a different gadget — which then can’t access values from the current gadget unless they dominate its entry block — or by returning from the enclosing function. The thing is, you can completely erase the concept of a gadget from your proposal, and you just get normal control flow within the enclosing function between blocks with different dominating sets.
Now, if you can properly call a gadget, such that the gadget can return back to the execution point that originally called, that’s legitimately different from anything we can currently express in LLVM. The usual term I’ve seen for this is a “sub-function”. There’s a unique instruction in the outer function that defines the sub-function, and all the blocks in the sub-function are dominated by the point immediately before that introduction. That implicitly means that the sub-function can’t refer to itself, and so it can’t call itself, which means that its activation record can be merged into the activation record of the parent function without any special consideration. All that you need extra is somewhere to save the local return address, if the sub-function is called from multiple places.
Formally, that’s still the standard subroutine model, just with an optimization for non-recursive nested functions to allow them to more closely intertwine with their outer function. To get something truly coroutine-like, it would need to be possible to suspend a sub-function and go back to the outer function without actually abandoning the sub-function’s state, thus allowing it to be resumed later. That should still be possible to implement without separate activation records as long as you can’t call a sub-function again while it’s already in flight. But I think it would be very tricky to describe this in an IR without making compilation quite difficult because of the potential for interaction between sub-functions as they got suspended and resumed.
John, I want to thank you for your willingness to read and comment on this thread. Let me try to overlay my understanding of the coroutines of 1967-68, my bbcoroutines, and your comments. I will use your term gadget temporarily.
My version of these gadgets is that they are separately compilable, but only visible to the enclosing function and the other gadgets in the set under that enclosing function. Only the enclosing function has an activation record. All the gadgets in the set under that enclosing function use that activation record.
Once the enclosing function jumps to a gadget, there is no way back into the enclosing function. The thread of control enters a gadget and jumps from gadget to gadget until one of them invokes a return. That return causes the activation record of the enclosing function to be popped from the stack and the thread of control to return to the routine that originally called the enclosing function. When the activation record is popped, all temporary variables for all of the gadgets in the set under the enclosing function are popped with it.
Gadgets cannot be called in the same way a function is called. The flow of control jumps to gadgets in the same way it jumps from a Basic Block (BB) to another BB within the same function. This is not like the enclosed sub-functions of ALGOL60. But, the gadgets are subordinate to the enclosing function. A return invoked in any gadget will use the return address of the enclosing function. All gadgets are dominated by the enclosing function, although the enclosing function could jump to different gadgets from different points in the enclosing function. Once within the set of gadgets, the flow of control could jump to the same gadget that is used by the enclosing function as an initial jump target on a separate invocation. Similarly, a gadget could invoke itself like a recursive function.
I see no reason that a multi-threaded process could not call the enclosing function in different threads and thereby jump into the code for same gadget since the separate threads each have their own stack of activation records and therefore the gadget from different threads would be putting their local data into different stacks handling different activation records for the same enclosing function…
John, you said that “To get something truly coroutine-like, it would need to be possible to suspend a sub-function and go back to the outer function without actually abandoning the sub-function’s state.” This is the crux of the matter. Gadgets must not have private local data that persists without invoking some static data capability. Rather, the local data in gadgets is volatile in the sense that it can only persist up to the point of a flow of control action that takes it out of the immediate gadget. That means that something like a loop index within a gadget will function normally since the index only persists while the loop persists and that would all be within a gadget. But, any other type of data structure would need to be declared in the enclosing function. For instance a local accumulator stack for the ALU, similar to those used in Java or Forth, would need to be declared by the enclosing function. Those declarations would also need to be made visible to the set of gadgets under that function. Essentially every gadget is implicitly enclosed by a compound statement scope.
I hope that clarifies my concept of these things.
The first link in the first post of this thread was entered incorrectly it should be http://melconway.com/Home/pdf/compiler.pdf by Melvin Conway rather than the article by Bell.
Yeah, you’re describing something isomorphic to basic blocks. I don’t intend to keep up with this thread; good luck with your investigations.
Thanks John, I do appreciate you insight.
After working with this form of coroutine, I am going to take your advice and start calling them “Super Basic Blocks” because they are essentially just Basic Blocks with internal Flow of Control.