Right. I would say that there are three interesting levels of closure-capture optimization:
1. We don't know when the inner function will be called; the outer function's frame may not still be intact. This is generally true of blocks and lambdas because they can be copied off the stack. If the block/lambda captures a reference to something in the outer function, U.B. might let us assume that the frame still exists, but I think only if the block/lambda actually uses that captured reference. Richard, let us know if there's some special rule that would help us here.
It's hard to see what optimizations would be possible here. Maybe, if we controlled the copying process, we could delay copying captured values into the block/lambda temporary, and instead just access them directly in the inner function? But the inner function would have to be compiled two ways, one for when the variables have been copied and one when they're still in-place. This is a lot of extra complexity.
2. We don't know exactly when or how often the inner function will be called, but there's a definite point in the outer function past which the inner function will no longer be used. This is true of some OpenMP features, "noescape" blocks in ObjC, and noescape blocks/closures in Swift.
We can optimize this by eliminating copies and indirections: instead of aggressively copying values and local addresses into the block/lambda, we can just store the enclosing frame pointer. If a capture is semantically "by reference", i.e. changes to it in the inner function are reflected in the outer and vice-versa, then we have to temporarily pin it into memory and treat its address as having escaped; we can then just access it relative to the captured frame pointer. If a capture is semantically "by value", i.e. it copies the value of a variable at a specific point, then we just have to keep that current value live for a while in the outer frame; in many cases, we can prove that neither function will modify it, so this can safely degrade to the by-reference implementation.
3. We know exactly when the inner function will be called and it can be fit into the outer function's CFG. This is true of some OpenMP features and some very specific Swift features.
We can still do normal data-flow and control-flow analyses between the outer and inner functions; the only constraint is that we (probably) have to treat the beginning and end of the inner function as opaque barriers. So the unlowered function looks something like:
%initial_result = call %initial_result_t @begin_subfunction(%initial_args_t %initial_args)
// subfunction goes here
%final_result = call %final_result_t @end_subfunction(%final_args_t %final_args)
and eventually that gets lowered into something like:
%final_result = call %final_result_t @do_with_subfunction(%initial_args, @subfunction)
define %final_args_t @subfunction(%initial_result_t %initial_result) {
// subfunction goes here
ret %final_args_t %final_args
}
with data flow in and out handled basically like in case #2. This is one class of thing that people want to be able to do with the coroutine proposal.
John.