[RFC] Adding CPS call support

Here is an example of emulating a CPS call using LLVM Coroutines:

https://gist.github.com/GorNishanov/d7caa7697681b96180e4648ed9408bc9

It corresponds to:

void f(int val) {
partA(val);
int result = cps_call_bar(41);
partB(result)
}

int cps_call_bar(int val) {
partC(val)
return 42;
}

int main() {
f(40);
}

Run: opt -enable-coroutines -O2 cps.ll -S

cps_call will look like:

define void @cps_call_bar(i8* %caller, i32* nocapture %retval.addr, i32 %n) local_unnamed_addr {

tail call void @partC(i32 %n)
store i32 42, i32* %retval.addr, align 4
%1 = bitcast i8* %caller to { i8*, i8* }*
%2 = getelementptr inbounds { i8*, i8* }, { i8*, i8* }* %1, i32 0, i32 0
%3 = load i8*, i8** %2
%4 = bitcast i8* %3 to void (i8*)*
tail call fastcc void %4(i8* %caller)
ret void
}

main will get optimized to:

define i32 @main() local_unnamed_addr {
entry:
%alloc.i = tail call i8* @malloc(i32 24)
tail call void @partA(i32 40)
tail call void @partC(i32 41)
tail call void @partB(i32 42)
tail call void @free(i8* nonnull %alloc.i)
ret i32 0
}

and resume part of f will be:

define internal fastcc void @f.resume(%f.Frame* %FramePtr) {
entry.resume:
%vFrame = bitcast %f.Frame* %FramePtr to i8*
tail call void @partB(i32 42)
tail call void @free(i8* %vFrame)
ret void
}

Note that this is using LLVM coroutines as is. To support CPS we would need to add a couple of intrinsics along the lines I mentioned in my earlier post.