Variadic functions are an ABI concern handled between clang and target backends. They currently block various optimisations (e.g inlining) and require target specific lowering. They also interfere with some analysis, e.g. stack size required by call trees. I remember writing the backend call lowering for a target which only really wanted a functional printf and am presently trying to avoid writing variadic call lowering in SDag and GlobalISel for amdgpu.
For targets which do not have an ABI locking them into an existing variadic calling convention, a reasonably simple target independent pass can rewrite all variadic functions to ones that take a void* instead of … and patch up the call sites to match. This could look like the work in progress D158246. With something equivalent to that in tree, targets that want working variadics and aren’t too worried about the details can cross them off their todo list in two lines of code. That would have brought me joy in a past role.
This RFC is because an emergent feature of this transform is unblocking all the optimisations that are currently thwarted by the variadic type. I’m interested in feedback on the current patch (rewriting the function has more edge cases than I expected) and in thoughts on changing how variadics are currently handled on other targets, and on whether this is indeed worth generalising as an optimisation on arbitrary targets.
Plan of record:
- I fix remaining handling in D158246
- Ship as a codegen pass for amdgpu, rewriting all functions in the module
- Optionally ship likewise on nvptx (would be helpful for libc testing)
- Escape analysis on va_list values to enable the same rewrite on some functions in other targets
- Evaluate abi-preserving translation on call sites on other targets
- See whether some existing backend lowering could be replaced with this sort of pass
- Refactor pass based on which parts generalised poorly
- When satisfied, move pass from codegen to transforms and enable it for all targets
In particular, I’d like minimal wrapping functions around vprintf that are popular for logging to inline and then constant fold, even on targets that use va_list representations other than void*.
The pass structure of is-safe-to-transform separated from apply-transform lends itself well to calling either from codegen to transform everything or from opt to transform only functions which are safe (don’t escape) and profitable (likely to be inline afterwards).
The general premise of replacing a variadic call with stashing the arguments on the caller stack and then branching to a modified callee which expects a void * is sound. Using it as the ABI on amdgpu unblocks libc and an internal ticket and gives a testing ground to get the details of the transform right before applying it more generally.