Musttail generating incorrect output for function with pass by value arguments

When a function with pass by value arguments is marked as musttail, incorrect code is generated resulting in a crash.
I have previously filed a bug at 56891 and attempted to patch it, but apparently it’s more complicated than I thought. If there is a permutation of arguments between the caller and callee, for example, F(a, b, c) tail calls G(b, c, a) where a, b and c have the same type, currently in X86ISelLowering::LowerCall there is no code handling such situation, which requires cyclic-swapping with a temp register.
I want to check the commit log of LowerCall to see how it’s originally designed and if the case with structs were omitted or intentionally ignored, but it seems this piece of code is quite old and I couldn’t find any info, so I am asking here if anybody is familiar with it.

musttail was added to LLVM IR relatively recently. So until recently, probably nobody realized it was an issue.

In general, to lower correctly, load all the arguments into temporaries, then store them all to the correct stack slots, I think. (It’s impossible to prove if a given pointer aliases a given argument slot, so you have to do all the loads before the stores.) See also ⚙ D85614 [TRE] Reland: allow TRE for non-capturing calls. . Not sure if there’s any implementation in SelectionDAG for another target you could refer to.

Creating temporaries for all arguments is inefficient, because they will be spilled on the stack anyways, so it is better to push them on the stack in the correct order for the callee, and perform a normal call instead of tail call.

To make tail call efficient when permuting arguments, we need to use only one temp to store the first argument on the permutation chain, and then replace it with the second one, the second with the third and so on, and finally replace the last with temp. And to prevent spilling, the permutation should be performed piecewise so that temp can fit in available registers.

I wouldn’t be too concerned about optimizing edge cases. If someone is using musttail, they should be aware of the performance implications.

If you can prove the relationship between the arguments to the call and the arguments to the function, sure, you can swap around the arguments, or something like that. In general, though, you have to deal with pointers you can’t analyze.

The main issue is that currently LLVM generates code that crash in this situation. Before a correct implementation is achieved, which one is a more reasonable behavior for LLVM?
(1) Leave it as-is, let the output program crash despite the source code being valid.
(2) Reject the code, or throw unimplemented error, so that the user is notified, and removes musttail at such usages. This effectively narrows the scope of applicability of musttail keyword.
(3) Ignore the musttail attribute, and emit a regular call instead, this is known to be running correctly. However the user will be unaware unless inspecting the disassembly, since musttail is a requirement, rathen than a suggestion to the compiler.

For (2), not sure how you plan to detect the cases that would miscompile. If you can do it somewhat reliably, sure, rejecting would be an improvement.

Ignoring musttail doesn’t really make sense.

Currently miscompile happens whenever a musttail function has a byval struct argument passed on the stack. I had a patch that partially fixes it so that the output is correct if there is no shuffling of argument is needed. ⚙ D131034 [Backend][X86] Improved tail call optimization for functions marked as musttail