structure-return tailcall

I'm working on pr51000, and thinking about the case of large structures returned by artificial sret pointer parm.

I have questions.

The itanium ABI requires functions that return a large struct this way, to *also* return that pointer as their scalar return value. (Let's not get into the pros and cons of that, it is what it is. I'm looking at x86_64 primarily, but I understand ISAs have similar ABIs.)

Anyway, to do that requires some data-flow work, and being a newbie to llvm IR, I can see two ways to do this. It is not clear to me which is the easiest or best. Plus I find discrepancies between documentation, tests and implementation!

Consider:

struct Big { int ary[50]; Big (); };

Big Foo ();

Big Bar () { return Foo (); }

Here's the IR:

define dso_local void @_Z3Barv(%struct.Big* noalias sret(%struct.Big) align 4 %0) local_unnamed_addr #0 {
   tail call void @_Z3Foov(%struct.Big* sret(%struct.Big) align 4 %0)
   ret void
}

I.e. the middle end figures this is tail callable, but I don't think it knows about the pointer return requirement (see below for evidence).

Test/documentation mismatch: The tailcall documentation says: (https://llvm.org/docs/LangRef.html#call-instruction)
   'Both markers imply that the callee does not access allocas
    from the caller.'

However, the X86 sibcall test (llvm/test/CodeGen/X86/sibcall.ll) seems to break that. Specifically:

define fastcc void @t21_sret_to_sret_alloca(%struct.foo* noalias sret(%struct.foo) %agg.result) nounwind {
   %a = alloca %struct.foo, align 8
   tail call fastcc void @t21_f_sret(%struct.foo* noalias sret(%struct.foo) %a) nounwind
   ret void
}

That call to t21_f_sret is referencing the frame-allocated %a object.

Question: Is sibcall.ll correct or not?

Implementation/documentation mismatch: I also note that the tail marker can appear even when the call is NOT the last (real) instruction in the function. That seems strange.

The documentation says:
   'The optional tail and musttail markers indicate that the
    optimizers should perform tail call optimization.'

Consider:
struct Big { int ary[50]; Big (); };

void Frob ();

Big Baz () { Big b; Frob (); return b; }

this generates:

define dso_local void @_Z3Bazv(%struct.Big* noalias nonnull sret(%struct.Big) align 4 %0) local_unnamed_addr #0 {
   tail call void @_ZN3BigC1Ev(%struct.Big* nonnull dereferenceable(200) %0)
   tail call void @_Z4Frobv()
   ret void
}

We can tail call Frob, but not Big's constructor. Why is the ctor marked as tailcallable?

[as an aside, if the middle end knew about the sret pointer return requirement, it wouldn't have marked Frob as tailcallable, right?]

Question: should the ctor not be marked tail call, or should the documentation be adjusted to at least mention this behaviour?

Anyway, the backend code-generator checks additional constraints before performing the tailcall.

a) Should the x86 backend track where it assigned the incomming sret pointer and see if that's being passed to the tail call? (I've not figured out how to do that yet).

b) or should the middle end annotate that tail call as passing the incoming sret? (metadata? new marker? something else?) This would seem to avoid having to implement #a for each backend that has this requirement.

Question: any insights as to whether #a or #b is the better direction?

nathan

I’m working on pr51000, and thinking about the case of large structures
returned by artificial sret pointer parm.

I have questions.

The itanium ABI requires functions that return a large struct this way,
to also return that pointer as their scalar return value. (Let’s not
get into the pros and cons of that, it is what it is. I’m looking at
x86_64 primarily, but I understand ISAs have similar ABIs.)

Super nit: it’s the psABI that controls how large objects are passed, the Itanium (C++) ABI only cares about the C+±y aspects of a struct. Apologies for the pedantry.

Anyway, to do that requires some data-flow work, and being a newbie to
llvm IR, I can see two ways to do this. It is not clear to me which is
the easiest or best. Plus I find discrepancies between documentation,
tests and implementation!

Consider:

struct Big { int ary[50]; Big (); };

Big Foo ();

Big Bar () { return Foo (); }

Here’s the IR:

define dso_local void @_Z3Barv(%struct.Big* noalias sret(%struct.Big)
align 4 %0) local_unnamed_addr #0 {
tail call void @_Z3Foov(%struct.Big* sret(%struct.Big) align 4 %0)
ret void
}

I.e. the middle end figures this is tail callable, but I don’t think it
knows about the pointer return requirement (see below for evidence).

Test/documentation mismatch: The tailcall documentation says:
(https://llvm.org/docs/LangRef.html#call-instruction)
‘Both markers imply that the callee does not access allocas
from the caller.’

However, the X86 sibcall test (llvm/test/CodeGen/X86/sibcall.ll) seems
to break that. Specifically:

define fastcc void @t21_sret_to_sret_alloca(%struct.foo* noalias
sret(%struct.foo) %agg.result) nounwind {
%a = alloca %struct.foo, align 8
tail call fastcc void @t21_f_sret(%struct.foo* noalias
sret(%struct.foo) %a) nounwind
ret void
}

That call to t21_f_sret is referencing the frame-allocated %a object.

Question: Is sibcall.ll correct or not?

I think you are correct: the IR will exhibit UB.

But, that doesn’t mean the test case isn’t useful. I haven’t looked further, but maybe the test is meant to illustrate what would happen if a user did the wrong thing by adding the tail marker when they shouldn’t have.

Implementation/documentation mismatch: I also note that the tail marker
can appear even when the call is NOT the last (real) instruction in the
function. That seems strange.

This is true. The tail marker doesn’t really mark call sites in tail positions, it’s a statement about aliasing. It simply marks call sites that do not reference stack objects from the current frame. If the call happens to be in the tail position later during codegen, it can become a TCO candidate.

The documentation says:
‘The optional tail and musttail markers indicate that the
optimizers should perform tail call optimization.’

Consider:
struct Big { int ary[50]; Big (); };

void Frob ();

Big Baz () { Big b; Frob (); return b; }

this generates:

define dso_local void @_Z3Bazv(%struct.Big* noalias nonnull
sret(%struct.Big) align 4 %0) local_unnamed_addr #0 {
tail call void @_ZN3BigC1Ev(%struct.Big* nonnull dereferenceable(200) %0)
tail call void @_Z4Frobv()
ret void
}

We can tail call Frob, but not Big’s constructor. Why is the ctor
marked as tailcallable?

I guess the documentation is too simplistic. The tail marker is really a way to pass AA knowledge to the backend. It doesn’t really indicate that the backend “should” perform TCO, it’s just passing down info from the middle-end.

[as an aside, if the middle end knew about the sret pointer return
requirement, it wouldn’t have marked Frob as tailcallable, right?]

Question: should the ctor not be marked tail call, or should the
documentation be adjusted to at least mention this behaviour?

Anyway, the backend code-generator checks additional constraints before
performing the tailcall.

a) Should the x86 backend track where it assigned the incomming sret
pointer and see if that’s being passed to the tail call? (I’ve not
figured out how to do that yet).

b) or should the middle end annotate that tail call as passing the
incoming sret? (metadata? new marker? something else?) This would seem
to avoid having to implement #a for each backend that has this requirement.

Question: any insights as to whether #a or #b is the better direction?

This feels like a target-specific constraint, so I feel like #a is better. You could peek at the IR to make this easy, though.

    I'm working on pr51000, and thinking about the case of large structures
    returned by artificial sret pointer parm.

    I have questions.

    The itanium ABI requires functions that return a large struct this way,
    to *also* return that pointer as their scalar return value. (Let's not
    get into the pros and cons of that, it is what it is. I'm looking at
    x86_64 primarily, but I understand ISAs have similar ABIs.)

Super nit: it's the psABI that controls how large objects are passed, the Itanium (C++) ABI only cares about the C++-y aspects of a struct. Apologies for the pedantry.

Yeah, I realized that shortly after sending -- I do know these things, honest :slight_smile:

    However, the X86 sibcall test (llvm/test/CodeGen/X86/sibcall.ll) seems
    to break that. Specifically:

    define fastcc void @t21_sret_to_sret_alloca(%struct.foo* noalias
    sret(%struct.foo) %agg.result) nounwind {
      %a = alloca %struct.foo, align 8
      tail call fastcc void @t21_f_sret(%struct.foo* noalias
    sret(%struct.foo) %a) nounwind
      ret void
    }

    That call to t21_f_sret is referencing the frame-allocated %a object.

    Question: Is sibcall.ll correct or not?

I think you are correct: the IR will exhibit UB.

But, that doesn't mean the test case isn't useful. I haven't looked further, but maybe the test is meant to illustrate what would happen if a user did the wrong thing by adding the tail marker when they shouldn't have.

Thanks for confirming my suspicion, I'll add a comment there, when I next touch that testcase. Testing for UB seems misguided IMHO (unless you're testing for a diagnostic telling you *UB ALERT*). My concern was what if I change the behavior of that particular case in fixing 51000.

    Implementation/documentation mismatch: I also note that the tail marker
    can appear even when the call is NOT the last (real) instruction in the
    function. That seems strange.

This is true. The tail marker doesn't really mark call sites in tail positions, it's a statement about aliasing. It simply marks call sites that do not reference stack objects from the current frame. If the call happens to be in the tail position later during codegen, it can become a TCO candidate.

Thanks. Perhaps I'll be enthused to clarify the documentation.

    Anyway, the backend code-generator checks additional constraints before
    performing the tailcall.

    a) Should the x86 backend track where it assigned the incomming sret
    pointer and see if that's being passed to the tail call? (I've not
    figured out how to do that yet).

This feels like a target-specific constraint, so I feel like #a is better. You could peek at the IR to make this easy, though.

This is indeed the way I am going -- learning all the DAG-related structures. It seems productive.

thank you for your comments.

nathan