HI Eli:
>> coro.barrier() doesn't work: if the address of the alloca doesn't
escape,
>> alias analysis will assume the barrier can't read or write the value of
>> the alloca, so the barrier doesn't actually block code movement.
Got it. I am new to this and learning a lot over the course
of this thread. Thank you for being patient with me.
Two questions and one clarification:
Q1: Do we have to have a load here?
>> block1:
>> %first_time = load... <--- What are we loading here?
Just an local alloca, initialized to false, and changed to true in the
return block.
>> br i1 %first_time, label return, label suspend1
>>
>> supend1:
>> %0 = coro.suspend()
>> switch %0 (resume1, destroy1)
Can we use three way coro.suspend instead?
Block1:
%0 = call i8 coro.suspend()
switch i8 %0, label suspend1 [i8 0 %return] ; or icmp + br i1
Suspend1:
switch i8 %0, label %resume1 [i8 1 %destroy1] ; or icmp + br i1
This doesn't look right: intuitively the suspend happens after the return
block runs.
One problem I can see is that someone can write a pass that might merge
two branches / switches into one switch and we are back where we were.
I guess what you meant by load, is to call some coro.is.first.time()
intrinsic.
So it looks like:
>> block1:
>> %first_time = call i1 coro.is.first.time()
>> br i1 %first_time, label return, label suspend1
>>
>> supend1:
>> %0 = coro.suspend()
>> switch %0 (resume1, destroy1)
This looks fine, there may be more uses for this intrinsic in the frontend.
Killing two birds with one stone. Good.
It doesn't really matter whether the bit gets tracked in an alloca or
through intrinsics.
Question 2: Why the switch in the return block?
I would think that **pre-split** return block would be simply:
return:
<run dtors for parameters, if required>
<conversion ops for ret value, if required>
<ret void> or <ret whatever>
Where and why I should put a switch that you mentioned in this return
block?
BTW, I am speaking of the return block as if it is one block,
but, it could be a dominating block over all the blocks that together
run the destructors, do return value conversion, etc.
The best way to be sure the compiler will understand the control flow is if
the coroutine acts like a normal function. Another way to put it is that
it should be possible to lower a coroutine to a thread rather than
performing the state machine transformation.
The switch answers the question of where the control flow actually goes
after the return block runs. Under normal function semantics, the "return"
block doesn't actually return: it just performs the one-time operations,
then jumps back to the suspend call. Therefore, you can't use "ret" there;
you have to connect the control flow back to the correct suspend call. The
switch makes that connection. So the return block looks like this:
<run dtors for parameters, if required>
<conversion ops for ret value, if required>
call coro.first_time_ret_value(value) ; CoroSplit replaces this with a
ret
switch ... ; jump to suspend; this is always dead in the lowered version
The dead switch is there so the optimizer will understand the control flow.
And yes, this would be much more straightforward with a two-function
approach.
Clarification:
==============
>> Also, if some non-C++ language wants to generate coroutines,
>> it might not have to generate the return block at all.
C++ coroutines are flexible. The semantic of a coroutine is defined via
traits, so you may define a coroutine that returns void. It does not have
to return coroutine handle or some struct that wraps the coroutine handle.
Oh, okay. I haven't actually looked at the spec; I'm mostly just going off
your description of what it does.
-Eli