understanding llvm's codegen for function forwarding

When compiling this LLVM IR with -O0 (no optimizations)

define internal fastcc void @bar2(%Bar* nonnull sret) unnamed_addr #2 !dbg !74 {
  call fastcc void @bar(%Bar* sret %0), !dbg !79
  ret void, !dbg !81

why does this generate this?

0000000000000090 <bar2>:
  90: 55 push %rbp
  91: 48 89 e5 mov %rsp,%rbp
  94: 48 83 ec 10 sub $0x10,%rsp
  98: 48 89 f8 mov %rdi,%rax
  9b: 48 89 45 f8 mov %rax,-0x8(%rbp)
  9f: e8 0c 00 00 00 callq b0 <bar>
  a4: 48 8b 45 f8 mov -0x8(%rbp),%rax
  a8: 48 83 c4 10 add $0x10,%rsp
  ac: 5d pop %rbp
  ad: c3 retq
  ae: 66 90 xchg %ax,%ax

instead of something like this?

0000000000000090 <bar2>:
  9f: e8 0c 00 00 00 callq b0 <bar>
  ad: c3 retq

when I add `musttail` to the IR it gives me this assembly:

00000000000000a0 <bar2>:
  a0: 55 push %rbp
  a1: 48 89 e5 mov %rsp,%rbp
  a4: 48 83 ec 10 sub $0x10,%rsp
  a8: 48 89 f8 mov %rdi,%rax
  ab: 48 89 45 f8 mov %rax,-0x8(%rbp)
  af: 48 83 c4 10 add $0x10,%rsp
  b3: 5d pop %rbp
  b4: e9 07 00 00 00 jmpq c0 <bar>
  b9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)

which does not have a call instruction but it has prologue that I
would not expect.

What's going on here? Is this something that can not really be
improved without optimization passes?

I'd assume -fomit-frame-pointer would make a difference.


Re-adding llvm-dev – silly phones not defaulting to reply-all…

There are several things here. The first one is -fno-omit-frame-pointer is causing the generation of “push %rbp ; mov %rsp, %rbp”. This would be required for accurate stack traces, so we can’t simplify to just “call / ret” as you suggest, without changing the option.

The less obvious one is the spilling of RDI to stack memory and reloading it into RAX, which is what I was raising. The Sys V ABI requires that the address of a struct returned by pointer be returned in RAX, and LLVM complies. It looks like I misremembered. We’ve always returned RDI in RAX for sret functions, since 2008 / r50075. However, we never did the right thing in 32-bit. I fixed that in https://bugs.llvm.org/show_bug.cgi?id=23491 / r237639. We don’t yet implement the general optimization of avoiding such spills by reusing the value returned in RAX, which is why we don’t get the simple “call / ret” code you suggest.

Finally, we miss the tail call opportunity because today we just give up if sret is present on either the caller of the callee. I think we could refine that to check for, do they agree, does the sret parameter match.