Experience with [[clang::musttail]]

Hi, I implemented [[clang::musttail]] and wrote the blog posts Chandler referenced above.

Overall I love the idea of standardizing this feature. As Chandler mentioned, it’s been a crucial tool for optimizing a lot of performance-critical code.

In my view, the trickiest issue is that it is platform-dependent whether a given tail call can be optimized or not. For example, a tail call that works fine on x86-64 (which passes arguments in registers) might be impossible to tail call on x86 (which passes arguments on the stack).

The existing [[clang::musttail]] attribute tries to solve this problem by defining a set of rules about how the caller and callee function signatures must match. This set of rules tries to provide a “portable” guarantee:

The target function must have the same number of arguments as the caller. The types of the return value and all arguments must be similar according to C++ rules (differing only in cv qualifiers or array size), including the implicit “this” argument, if any. Any variables in scope, including all arguments to the function and the return value must be trivially destructible. The calling convention of the caller and callee must match, and they must not be variadic functions or have old style K&R C function declarations.

[…]

clang::musttail provides assurances that the tail call can be optimized on all targets, not just one.

I would argue that this design has not worked well in practice. After this attribute landed in Clang, two main things happened:

  1. Some backends failed to optimize certain tail calls, even though they followed the “portable” rules. This manifested as compiler crashes in the backend (1, 2, 3, 4, etc).
  2. Users complained that the compiler rejected [[clang::musttail]] even on calls that were clearly possible to optimize on the current platform.

In other words, the current design set of constraints is simultaneously too strict and not strict enough. It’s not strict enough to provide the guarantees it wants to provide, but also too strict compared to what users want.

I think the best solution to this conundrum is to do what the proposed C standard does: just make it completely architecture-dependent whether a given musttail is accepted or not. (Incidentally, I also like the return goto syntax proposed for C.)

In practice, this will mean that each project has to manually put #ifdef around any code that uses tail calls, and manually manage which set of platforms use the return goto path. This is what Protobuf (for example) does now.

This is not the most elegant solution, but it is simple and transparent. It gives the user the full capabilities of the current platform, while empowering them to write a fallback path that does not require tail calls.

If we were trying to do the most elegant thing, we might wish for a constexpr function like constexpr bool std::can_tail_call<From, To>(), so that you could use if constexpr () or templates to precisely target the set of platforms where a given tail call will be possible. But this would require that target-specific information be fed into the Clang frontend for performing semantic analysis. This would introduce coupling between Clang and LLVM, and would cause divergence from C (which would still need to use #ifdef). Overall, I think the complexity of this would make it far more difficult to implement, and would make it less likely that compilers would support it even if it was standardized.

So overall, I propose to relax the constraints of musttail, at both the Clang and LLVM level. The backend can issue an error diagnostic if it finds that the given tail call is not possible on the current platform.

3 Likes