Optimizing sret on caller side

In C/C++ when a big struct is returned from a function the calling convention requires the caller to allocate the space for this object and pass the pointer to this space as the function first hidden parameter. This is represented with sret attribute.

struct Big
    long v[100];

Big f() noexcept;

void caller(Big* out) noexcept
    *out = f();

In the example, the caller first allocates space on stack and passes that to f(). Then it has to copy the Big object to *out. Can this be optimized in a way that caller passes out to f() directly?

Not in general. If f has independent access to out (e.g. if it’s a pointer to a global) then passing it as the sret parameter would corrupt the value before it should have been changed (i.e. when f returns).

LLVM doesn’t do the optimization even in cases where it could though (for example if it can see f doesn’t do anything like that).

@TNorthover Could you please elaborate on the cases LLVM doesn’t optimize?

Probably most/all, I replaced f with

__attribute__((noinline)) Big f() noexcept {
  return {0};

which to me seems about the most obviously safe function possible and LLVM didn’t do the optimization (even if I hacked argmemonly into the IR for f).

It’d need some special knowledge of sret semantics, which probably just hasn’t been implemented.

LLVM does support this optimization in general (this is the “call slot optimization” performed by the MemCpyOpt pass), but it does have some pretty steep preconditions. The bit missing in your case is that the pointer needs to be dereferenceable, aligned and noalias. Here is a working variant with a restrict reference: Compiler Explorer

In C++ this would work for forwarding between sret parameters (which are dereferenceable, aligned and noalias), but that particular case is also covered by NRVO on the language level, so it’s not particularly relevant.