PSA: Perfectly forwarding thunks can now be expressed in LLVM IR with musttail and varargs

I needed this functionality to solve http://llvm.org/PR20653, but it obviously has far more general applications.

You can do it like this:

define i32 @my_forwarding_thunk(i32 %arg1, i8* %arg2, …) {
… ; define new_arg1 and new_arg2
%r = musttail call i32 (i32, i8*, …)* @the_true_target(i32 %new_arg1, i8* %new_arg2, …)
ret i32 %r
}

declare i32 @the_true_target(i32, i8*, …)

The varargs convention (usually) matches the standard function call convention, and everything will line up if you do an indirect call like this:

declare i32 @the_true_target(%struct.Foo* %this, i64, i8)
define i32 @my_forwarding_thunk(%struct.Foo* %this, …) {
%fptr = … ; Compute fptr by bitcasting @the_true_target to the varargs type
%r = musttail call i32 (%struct.Foo*, …)* %fptr(%struct.Foo* %this, …)
ret i32 %r
}

Currently this functionality is only implemented for x86 in the absence of inreg and for x86_64 in the general case, but I’d like to see it implemented for the CPU backends. I’m happy to do some of them, but I don’t have the time to do all of them.

Alternatively, it would be great if we could handle forwarding of unused register parameters in variadic functions in a general way. Perhaps CCState should surface this information.

Thoughts? This seemed like a reasonable way to represent such thunks, but I’d like to know if there are objections.

+David, Doug, Tim

I’ve implemented this forwarding for x86 / x86_64.

The one target I know about where varargs are passed differently from normal arguments is aarch64-apple-ios/macosx. After thinking a bit more, I think this forwarding thunk representation works fine even on that target. Typically a forwarding thunk is called indirectly, or at least through a bitcast, so the LLVM IR call site would look like:

define i32 @forwarding_thunk_caller() {
%r = call i32 (%struct.Foo*, i64, i8)* bitcast (void (i8*, …)* @adjustor_thunk to i32 (%struct.Foo*, i64, i8)) (%struct.Foo null, i64 42, i8 13)
ret i32 %r
}

The thunk will have a varargs prototype, but it will also arrange to forward the unconsumed register parameters through to the musttail call site. I haven’t implemented this yet for non-x86 architectures, but I plan to soon.

Does anyone object to this representation?

MIPS also has some subtle (and annoying) differences between variadic and non-variadic function calls. Most notably (o32-only) that the stack pointer will be in a different place in the callee for variadic and non-variadic calls.

The variadic calling convention that we're using for our extension currently requires all variadic arguments to be spilled to the stack (we found this actually improves performance very slightly, as most things that use variadic functions call a function that takes a va_list argument and you can construct the va_list by just saving the stack pointer, but we did it for correctness and simplicity originally).

Implementing perfect forwarding would require that the caller calls the thunk with the non-variadic calling convention and that the thunk is aware that it must (potentially) preserve some more registers. In particular, it mustn't touch any of the argument registers for the specified calling convention, unless explicitly modifying those arguments.

David

This sounds similar to the aarch64 case. We just need to copy all
unconsumed physical register parameters to virtual registers in the
prologue, and copy them back into physical registers before a variadic
musttail call.

As long as the thunk does not intend to read and modify arguments in the
va_list, this approach should work. It just means we'll preserve more
registers than we need to if the thunk target is actually a variadic
function.

Again, my use case is C++ adjustor thunks, where the 'this' parameter is
always "prototyped" and never part of the va_list.

That's more or less the same approach taken by MIPS. Our specific problem was that MIPS n64 passes the first 8 doublewords of arguments in integer registers, but passing capabilities in integer registers is not possible, so we had to tweak the variadic convention a bit.

David