Proposal: Add a guaranteed tail call marker

Some frontends for LLVM require that LLVM perform tail call optimization (TCO) for correctness. Internally, LLVM refers to TCO of a non-recursive tail call as sibling call optimization, but I’m going to refer to that generically as TCO. Often, functional languages like Scheme have a language-level requirement that TCO occurs for any call in the tail position, and this is usually why users of LLVM have asked for guaranteed TCO functionality.

In my case, to implement vtable thunks and virtual member pointers in the IA32 Microsoft C++ ABI, I cannot simply forward the arguments to the callee without calling the copy constructor. If I can use a guaranteed tail call, I don’t have to emit copy constructor calls, and things are much easier.

Currently, in order to get guaranteed TCO, frontends have to enable the GuaranteedTailCall codegen option and obey a narrow set of rules, which includes always using fastcc. This is fairly awkward and doesn’t solve my use case, since the ABI requires a particular convention.

Instead, I propose that we add a new tail call marker, ‘musttail’, that guarantees that TCO will occur. I’m open to other naming suggestions. Some strawmen are ‘tailonly’ or ‘guaranteedtail’. Along with it, I propose a set conservative of verifier enforced rules to follow to ensure that most reasonable backends will be able to perform TCO. It also ensures that optimizations, like tail merging, don’t accidentally move a call out of tail position.

First, the prototype of the caller and the callee must be “congruent”. Two prototypes are congruent if all return and parameter types match except for the pointer types. The pointer types can have different pointee types, but they must have the same address space. In addition, all the ABI impacting attributes on the parameters must be the same, meaning byval, inalloca, inreg, etc, must all match up.

Second, the call must be in tail position. The call must be immediately followed by a bitcast or a ret, both of which must use the result of the call. If there is a bitcast, it must be followed by a ret which uses the bitcast.

Importantly, LLVM must perform TCO regardless of the computation necessary to compute the arguments for the callee, which is not the case today.

I sent a patch to llvm-commits, but I’d like to hear high-level feedback on llvmdev:
http://llvm-reviews.chandlerc.com/D3240

Thanks!

It would even be invalid to introduce calls to the copy constructor,
no? I don't think the thunks exist at the c++ standard level, so we
cannot create extra copy constructor calls for them. In other words,
we need something like what you are proposing.

Having this also puts us on the path to improve itanium codegen for

struct A {
  virtual void f(int x, ...);
};
struct B {
  virtual void f(int x, ...);
};
struct C : A, B {
  virtual void c();
  virtual void f(int x, ...);
};
void C::f(int x, ...) { }

Currently we have clang duplicate the body of C::f. We cannot create
a normal copy since we cannot represent the copy of the varargs, even
when we want llvm to eliminate those copies. We could use a inalloca
argument for the rest of the frame instead of a "...". Since we have a
guarantee that no copy will be done, we don't have to know the size.

Cheers,
Rafael

Yep! Fixing this problem is definitely doable with this feature. Rather
than using inalloca, I'd add an Ellipsis Value so we can write:
musttail call void @fn(i8* %this_adj, ...)

You wouldn't be able to place '...' anywhere other than as the last
argument of a musttail call, since we can't really lower it otherwise.