[RFC] Desugar variadics. Codegen for new targets, optimisation for existing

Variadic functions are an ABI concern handled between clang and target backends. They currently block various optimisations (e.g inlining) and require target specific lowering. They also interfere with some analysis, e.g. stack size required by call trees. I remember writing the backend call lowering for a target which only really wanted a functional printf and am presently trying to avoid writing variadic call lowering in SDag and GlobalISel for amdgpu.

For targets which do not have an ABI locking them into an existing variadic calling convention, a reasonably simple target independent pass can rewrite all variadic functions to ones that take a void* instead of … and patch up the call sites to match. This could look like the work in progress D158246. With something equivalent to that in tree, targets that want working variadics and aren’t too worried about the details can cross them off their todo list in two lines of code. That would have brought me joy in a past role.

This RFC is because an emergent feature of this transform is unblocking all the optimisations that are currently thwarted by the variadic type. I’m interested in feedback on the current patch (rewriting the function has more edge cases than I expected) and in thoughts on changing how variadics are currently handled on other targets, and on whether this is indeed worth generalising as an optimisation on arbitrary targets.

Plan of record:

  • I fix remaining handling in D158246
  • Ship as a codegen pass for amdgpu, rewriting all functions in the module
  • Optionally ship likewise on nvptx (would be helpful for libc testing)
  • Escape analysis on va_list values to enable the same rewrite on some functions in other targets
  • Evaluate abi-preserving translation on call sites on other targets
  • See whether some existing backend lowering could be replaced with this sort of pass
  • Refactor pass based on which parts generalised poorly
  • When satisfied, move pass from codegen to transforms and enable it for all targets

In particular, I’d like minimal wrapping functions around vprintf that are popular for logging to inline and then constant fold, even on targets that use va_list representations other than void*.

The pass structure of is-safe-to-transform separated from apply-transform lends itself well to calling either from codegen to transform everything or from opt to transform only functions which are safe (don’t escape) and profitable (likely to be inline afterwards).

The general premise of replacing a variadic call with stashing the arguments on the caller stack and then branching to a modified callee which expects a void * is sound. Using it as the ABI on amdgpu unblocks libc and an internal ticket and gives a testing ground to get the details of the transform right before applying it more generally.

Thanks!

1 Like

Thanks for working on this. We really want the functionality in the middle end, our current lack of optimizations for varargs is scary.

In general, the layout of the variadic argument buffer is target-specific, so converting the variadic argument list to a buffer needs to be target-specific. Given that, generalizing the transform is going to be complicated. But having the functionality would probably be useful.

I think there are two reasons for optimism in the face of target specific and unaltered layout.

One is that some functions can be rewritten anyway, because the va_list structure doesn’t escape. E.g. any leaf function.

Two is that lots of targets use a void* to an array of ints as the representation, and only vary in whether some int slots are skipped on alignment grounds.

Unfortunately x64 is one of the platforms I’m interested in and that structure is not simple. Still thinking about that one. A target specific hook that puts N values in a struct that is ABI compatible with a va_list might be reasonable, especially if the default works for most targets.

I don’t have line of sight on the optimal design choices yet. The space looks tractable though.