[PATCH] D12923: Add support for function attribute "notail"

+llvm-dev

Can you give a bit of background on what you're trying to address here? After reading through the discussion and seeing that this is a best effort flag, I'm not sure that a function attribute is the best way to describe this. I'm open to being convinced it is, but I'd like to hear a bit more about the use case and get broader visibility on the proposal first.

Philip

Several users have been asking for this function attribute to prevent losing the calling functions’s information in the backtrace. If we attach the attribute to a function, ideally we would want to prevent tail call optimization on all call sites that call the function. However, the compiler cannot always tell which function is called from a call site if it’s an indirect call, so it’s fine if an indirect call to the marked function ends up being tail-call optimized. For direct calls, we want the function attribute to prevent tail call 100% of the time.

We can also use a “notail” marker on the call instruction instead of using a function attribute. The only downside of using a marker is that we probably will never be able to prevent tail call optimization on indirect calls even when the compiler can turn it into a direct call (for example, via inlining). I’m not sure at the moment how important this is.

To be clear, this is a debuging aid only? It’s not something required for correctness? I’m somewhat bothered by that because it seems like it would be a useful implementation tool for higher level languages.

A couple of thoughts in no particular order:

  1. Can we always annotate the call site rather than the function? That removes the unpredictability due to optimization.
  2. Calling it something like “no-direct-tail-call” or “prefer-no-tail” would remove some of the confusion value. When I see “notail”, I expect that to always be respected; the best effort semantics come as a bit of a surprise.
  3. This seems analogous to the “tail” marker in that it indicates a preference/option. Whatever we end up with, it needs to be a verifier option to have a “tail” or “musttail” call site which is also “notail”. It also needs to be an error to have a mustail callsite to a notail function (if such ends up existing.)
  4. It somewhat feels like there are two concepts being intermixed here. 1) A call site which will never be a tail call. 2) A function which we prefer not to tail call to. Does it make sense to separate them?

Philip

To be clear, this is a debuging aid only? It's not something required for
correctness? I'm somewhat bothered by that because it seems like it would
be a useful implementation tool for higher level languages.

It's not purely a debugging aid that helps when you are using the debugger.
There are projects (that are not debuggers) that rely on not missing frames
to produce results that are useful.

A couple of thoughts in no particular order:
1) Can we always annotate the call site rather than the function? That
removes the unpredictability due to optimization.

Annotating the call site should be fine. For the use cases that we care
about, it probably isn't important to prevent tail calls on indirect calls.

2) Calling it something like "no-direct-tail-call" or "prefer-no-tail"
would remove some of the confusion value. When I see "notail", I expect
that to always be respected; the best effort semantics come as a bit of a
surprise.

I agree. A name that indicates it's only a best effort option or it's an
option that affects only direct calls would be good.

3) This seems analogous to the "tail" marker in that it indicates a
preference/option. Whatever we end up with, it needs to be a verifier
option to have a "tail" or "musttail" call site which is also "notail". It
also needs to be an error to have a mustail callsite to a notail function
(if such ends up existing.)

If we are going to annotate the function, I think we should have the
verifier catch incompatibilities between the markers on the call sites and
the function attribute on the called functions.

If we are annotating the call site, the verifier check isn't needed since
the tail-call related markers are enums that are mutually exclusive.

4) It somewhat feels like there are two concepts being intermixed here.
1) A call site which will never be a tail call. 2) A function which we
prefer not to tail call to. Does it make sense to separate them?

Yes, it makes sense to separate them. For the use case we care about,
either 1) or 2) will do. We don't have to have support for both.

Philip

If it’s not simply best effort, that constrains our choices. Given this, I would lean towards a notail value being added as an alternative to “tail” and “musttail”. This seems to fit the existing uses, doesn’t have any obvious loop holes or best effort semantics, and solves the problem at hand. (This only applies if we’re talking about a function annotation. The call site annotation applies to both direct and indirect calls.) Yep. I would lean toward doing (1) for now. We can come back and implement (2) at a later time if we find it’s needed. After (1), each call site will have four states: - “notail” - Can not be a tail call. - “” - May be a tail call if analysis finds it legal, profitable, and desirable* - “tail” - May be a tail call, profitability hinted - “musttail” - Must be a tail call. * (2) would basically just change the desirability of moving from “” to “tail”.

I think option #1 would be useful for “correctly” emitting calls to dladdr; it wants to be able to walk up the stack to find what shared object it was called from.

To be clear, this is a debuging aid only? It's not something required
for correctness? I'm somewhat bothered by that because it seems like it
would be a useful implementation tool for higher level languages.

It's not purely a debugging aid that helps when you are using the
debugger. There are projects (that are not debuggers) that rely on not
missing frames to produce results that are useful.

If it's not simply best effort, that constrains our choices.

A couple of thoughts in no particular order:
1) Can we always annotate the call site rather than the function? That
removes the unpredictability due to optimization.

Annotating the call site should be fine. For the use cases that we care
about, it probably isn't important to prevent tail calls on indirect calls.

Given this, I would lean towards a notail value being added as an
alternative to "tail" and "musttail". This seems to fit the existing uses,
doesn't have any obvious loop holes or best effort semantics, and solves
the problem at hand.

2) Calling it something like "no-direct-tail-call" or "prefer-no-tail"
would remove some of the confusion value. When I see "notail", I expect
that to always be respected; the best effort semantics come as a bit of a
surprise.

I agree. A name that indicates it's only a best effort option or it's an
option that affects only direct calls would be good.

(This only applies if we're talking about a function annotation. The call
site annotation applies to both direct and indirect calls.)

3) This seems analogous to the "tail" marker in that it indicates a
preference/option. Whatever we end up with, it needs to be a verifier
option to have a "tail" or "musttail" call site which is also "notail". It
also needs to be an error to have a mustail callsite to a notail function
(if such ends up existing.)

If we are going to annotate the function, I think we should have the
verifier catch incompatibilities between the markers on the call sites and
the function attribute on the called functions.

If we are annotating the call site, the verifier check isn't needed since
the tail-call related markers are enums that are mutually exclusive.

Yep.

4) It somewhat feels like there are two concepts being intermixed here.
1) A call site which will never be a tail call. 2) A function which we
prefer not to tail call to. Does it make sense to separate them?

Yes, it makes sense to separate them. For the use case we care about,
either 1) or 2) will do. We don't have to have support for both.

I would lean toward doing (1) for now. We can come back and implement (2)
at a later time if we find it's needed. After (1), each call site will
have four states:
- "notail" - Can not be a tail call.
- "" - May be a tail call if analysis finds it legal, profitable, and
desirable*
- "tail" - May be a tail call, profitability hinted
- "musttail" - Must be a tail call.

* (2) would basically just change the desirability of moving from "" to
"tail".

OK. I'm considering changing the direction of this patch and marking the
call site instead of the called function.

We should also discuss what kinds of source level attributes we'll need. My
plan is to attach an attribute that indicates notail (something like
no_direct_tail) to the called function declaration and definition and then
mark all the direct call sites in the IR that call the function as notaill.
In addition to that, it seems like we want to have a way to attach the
attribute directly to the call site:

void (*indirectCall)(int, int, int);

void foo1(int a, int b) {
(*indirectCall)(a, b, c) __attribute__((notail));
}

I think you’re going to want to have the same dichotomy between (1) and (2) at the source level as in the IR. The same issues apply in both cases.

To be clear, this is a debuging aid only? It's not something required
for correctness? I'm somewhat bothered by that because it seems like it
would be a useful implementation tool for higher level languages.

It's not purely a debugging aid that helps when you are using the
debugger. There are projects (that are not debuggers) that rely on not
missing frames to produce results that are useful.

If it's not simply best effort, that constrains our choices.

A couple of thoughts in no particular order:
1) Can we always annotate the call site rather than the function? That
removes the unpredictability due to optimization.

Annotating the call site should be fine. For the use cases that we care
about, it probably isn't important to prevent tail calls on indirect calls.

Given this, I would lean towards a notail value being added as an
alternative to "tail" and "musttail". This seems to fit the existing uses,
doesn't have any obvious loop holes or best effort semantics, and solves
the problem at hand.

2) Calling it something like "no-direct-tail-call" or "prefer-no-tail"
would remove some of the confusion value. When I see "notail", I expect
that to always be respected; the best effort semantics come as a bit of a
surprise.

I agree. A name that indicates it's only a best effort option or it's an
option that affects only direct calls would be good.

(This only applies if we're talking about a function annotation. The
call site annotation applies to both direct and indirect calls.)

3) This seems analogous to the "tail" marker in that it indicates a
preference/option. Whatever we end up with, it needs to be a verifier
option to have a "tail" or "musttail" call site which is also "notail". It
also needs to be an error to have a mustail callsite to a notail function
(if such ends up existing.)

If we are going to annotate the function, I think we should have the
verifier catch incompatibilities between the markers on the call sites and
the function attribute on the called functions.

If we are annotating the call site, the verifier check isn't needed since
the tail-call related markers are enums that are mutually exclusive.

Yep.

4) It somewhat feels like there are two concepts being intermixed here.
1) A call site which will never be a tail call. 2) A function which we
prefer not to tail call to. Does it make sense to separate them?

Yes, it makes sense to separate them. For the use case we care about,
either 1) or 2) will do. We don't have to have support for both.

I would lean toward doing (1) for now. We can come back and implement
(2) at a later time if we find it's needed. After (1), each call site will
have four states:
- "notail" - Can not be a tail call.
- "" - May be a tail call if analysis finds it legal, profitable, and
desirable*
- "tail" - May be a tail call, profitability hinted
- "musttail" - Must be a tail call.

* (2) would basically just change the desirability of moving from "" to
"tail".

OK. I'm considering changing the direction of this patch and marking the
call site instead of the called function.

We should also discuss what kinds of source level attributes we'll need.
My plan is to attach an attribute that indicates notail (something like
no_direct_tail) to the called function declaration and definition and then
mark all the direct call sites in the IR that call the function as notaill.
In addition to that, it seems like we want to have a way to attach the
attribute directly to the call site:

void (*indirectCall)(int, int, int);

void foo1(int a, int b) {
(*indirectCall)(a, b, c) __attribute__((notail));
}

I think you're going to want to have the same dichotomy between (1) and
(2) at the source level as in the IR. The same issues apply in both
cases.

I think you were suggesting we always annotate the call site rather than
the function. Are you suggesting we do the same thing at the source level,
i.e. allow marking the call site but not the function? I have to confirm,
but I believe the people who requested this feature wanted to use a
function attribute rather than marking each call site that calls the target
function.

What I’m suggesting is that the function attribute should be a best effort semantic. If we can tell something is a direct call, we should avoid the tail call, but we don’t guarantee to never emit a tail call to the function in question. If you need an absolutely guarantee that a particular call (even if indirect) will not be a tail call, you’d need to mark the call site. (In practice, the “prefer_no_tail” function attribute will probably work pretty reliability, but I’m concerned about “promising” that it will work. Doing so for a subset of calls (i.e. “statically direct calls”) and may work for others is somewhat defensible, but I’ve learned not to make promises the optimizer has a hard time keeping. Does the distinction make sense?)

To be clear, this is a debuging aid only? It's not something required
for correctness? I'm somewhat bothered by that because it seems like it
would be a useful implementation tool for higher level languages.

It's not purely a debugging aid that helps when you are using the
debugger. There are projects (that are not debuggers) that rely on not
missing frames to produce results that are useful.

If it's not simply best effort, that constrains our choices.

A couple of thoughts in no particular order:
1) Can we always annotate the call site rather than the function? That
removes the unpredictability due to optimization.

Annotating the call site should be fine. For the use cases that we care
about, it probably isn't important to prevent tail calls on indirect calls.

Given this, I would lean towards a notail value being added as an
alternative to "tail" and "musttail". This seems to fit the existing uses,
doesn't have any obvious loop holes or best effort semantics, and solves
the problem at hand.

2) Calling it something like "no-direct-tail-call" or "prefer-no-tail"
would remove some of the confusion value. When I see "notail", I expect
that to always be respected; the best effort semantics come as a bit of a
surprise.

I agree. A name that indicates it's only a best effort option or it's an
option that affects only direct calls would be good.

(This only applies if we're talking about a function annotation. The
call site annotation applies to both direct and indirect calls.)

3) This seems analogous to the "tail" marker in that it indicates a
preference/option. Whatever we end up with, it needs to be a verifier
option to have a "tail" or "musttail" call site which is also "notail". It
also needs to be an error to have a mustail callsite to a notail function
(if such ends up existing.)

If we are going to annotate the function, I think we should have the
verifier catch incompatibilities between the markers on the call sites and
the function attribute on the called functions.

If we are annotating the call site, the verifier check isn't needed
since the tail-call related markers are enums that are mutually exclusive.

Yep.

4) It somewhat feels like there are two concepts being intermixed
here. 1) A call site which will never be a tail call. 2) A function which
we prefer not to tail call to. Does it make sense to separate them?

Yes, it makes sense to separate them. For the use case we care about,
either 1) or 2) will do. We don't have to have support for both.

I would lean toward doing (1) for now. We can come back and implement
(2) at a later time if we find it's needed. After (1), each call site will
have four states:
- "notail" - Can not be a tail call.
- "" - May be a tail call if analysis finds it legal, profitable, and
desirable*
- "tail" - May be a tail call, profitability hinted
- "musttail" - Must be a tail call.

* (2) would basically just change the desirability of moving from "" to
"tail".

OK. I'm considering changing the direction of this patch and marking the
call site instead of the called function.

We should also discuss what kinds of source level attributes we'll need.
My plan is to attach an attribute that indicates notail (something like
no_direct_tail) to the called function declaration and definition and then
mark all the direct call sites in the IR that call the function as notaill.
In addition to that, it seems like we want to have a way to attach the
attribute directly to the call site:

void (*indirectCall)(int, int, int);

void foo1(int a, int b) {
(*indirectCall)(a, b, c) __attribute__((notail));
}

I think you're going to want to have the same dichotomy between (1) and
(2) at the source level as in the IR. The same issues apply in both
cases.

I think you were suggesting we always annotate the call site rather than
the function. Are you suggesting we do the same thing at the source level,
i.e. allow marking the call site but not the function? I have to confirm,
but I believe the people who requested this feature wanted to use a
function attribute rather than marking each call site that calls the target
function.

What I'm suggesting is that the function attribute should be a best effort
semantic. If we can tell something is a direct call, we should avoid the
tail call, but we don't guarantee to never emit a tail call to the function
in question. If you need an absolutely guarantee that a particular call
(even if indirect) will not be a tail call, you'd need to mark the call
site.

(In practice, the "prefer_no_tail" function attribute will probably work
pretty reliability, but I'm concerned about "promising" that it will work.
Doing so for a subset of calls (i.e. "statically direct calls") and may
work for others is somewhat defensible, but I've learned not to make
promises the optimizer has a hard time keeping. Does the distinction make
sense?)

That was what I had in mind: the function attribute blocks tail call for
statically direct calls but doesn't promise anything (in fact, does
nothing) for indirect calls.

Do you think we shouldn't make any promises for statically direct calls
either? I don't see why it's hard to keep the promise that direct tail
calls will be blocked. Do you have a particular optimization in mind that
would be difficult or impossible to implement if we promised to block
direct calls?

That was what I had in mind: the function attribute blocks tail call for statically direct calls but doesn't promise

> anything (in fact, does nothing) for indirect calls.
>
> Do you think we shouldn't make any promises for statically direct calls either? I don't see why it's hard to keep the
> promise that direct tail calls will be blocked. Do you have a particular optimization in mind that would be difficult or
> impossible to implement if we promised to block direct calls?

I think in this scheme we'll have problems around devirtualization.
For instance, you could start with a indirect call to a notail target
that would get TCO'ed (as an indirect call), after which
devirtualization or load/store motion would turn the indirect call
into a direct call; and you'd end up with a direct tail call to a
notail target.

I don't know what your requirements are, but if I were you I'd design
the `notail` marker to be logically part of a function's calling
convention. That way functions marked `notail` get to do things that
are legal only if the call to them wasn't a tail call, because all
well-defined calls constrain the caller to match callee's expectation
with respect to tail calls. In the CFE, you could make the `notail`
attribute have the same restrictions / conventions as things like
`fastcall`.

-- Sanjoy

> That was what I had in mind: the function attribute blocks tail call for statically direct calls but doesn't promise
> anything (in fact, does nothing) for indirect calls.
>
> Do you think we shouldn't make any promises for statically direct calls either? I don't see why it's hard to keep the
> promise that direct tail calls will be blocked. Do you have a particular optimization in mind that would be difficult or
> impossible to implement if we promised to block direct calls?

I think in this scheme we'll have problems around devirtualization.
For instance, you could start with a indirect call to a notail target
that would get TCO'ed (as an indirect call), after which
devirtualization or load/store motion would turn the indirect call
into a direct call; and you'd end up with a direct tail call to a
notail target.

Sanjoy hits on some of the same concerns I had with this. In particular, there's a distinction between what the language specification defines as a direct call and what the optimizer/compiler might turn into a direct call. At the source layer, we can *only* make guarantees about the former. Any specification we write has to account for the fact the later are an implementation detail entirely under the control of the compiler. i.e. optimizations can not break the semantics. You have to ensure this when defining the semantics.

I don't know what your requirements are, but if I were you I'd design
the `notail` marker to be logically part of a function's calling
convention. That way functions marked `notail` get to do things that
are legal only if the call to them wasn't a tail call, because all
well-defined calls constrain the caller to match callee's expectation
with respect to tail calls. In the CFE, you could make the `notail`
attribute have the same restrictions / conventions as things like
`fastcall`.

I disagree with this slightly. As I've stated previously, I think there are two distinct semantics here: a) this call must be a notail call, and b) we'd "like" this to be a no tail, but it's optional. We need to not mix them.

> That was what I had in mind: the function attribute blocks tail call
for statically direct calls but doesn't promise
> anything (in fact, does nothing) for indirect calls.
>
> Do you think we shouldn't make any promises for statically direct calls
either? I don't see why it's hard to keep the
> promise that direct tail calls will be blocked. Do you have a
particular optimization in mind that would be difficult or
> impossible to implement if we promised to block direct calls?

I think in this scheme we'll have problems around devirtualization.
For instance, you could start with a indirect call to a notail target
that would get TCO'ed (as an indirect call), after which
devirtualization or load/store motion would turn the indirect call
into a direct call; and you'd end up with a direct tail call to a
notail target.

Sanjoy hits on some of the same concerns I had with this. In particular,
there's a distinction between what the language specification defines as a
direct call and what the optimizer/compiler might turn into a direct call.
At the source layer, we can *only* make guarantees about the former. Any
specification we write has to account for the fact the later are an
implementation detail entirely under the control of the compiler. i.e.
optimizations can not break the semantics. You have to ensure this when
defining the semantics.

By "statically direct calls", I meant the set of calls that can be
determined to be direct at the source level, which is normally a subset of
the calls that end up being direct calls after all the optimization passes
are run. I agree that we can probably promise or guarantee that tail call
will be blocked for direct calls at the source level. We don't guarantee
anything for indirect calls at the source level.

I don't know what your requirements are, but if I were you I'd design
the `notail` marker to be logically part of a function's calling
convention. That way functions marked `notail` get to do things that
are legal only if the call to them wasn't a tail call, because all
well-defined calls constrain the caller to match callee's expectation
with respect to tail calls. In the CFE, you could make the `notail`
attribute have the same restrictions / conventions as things like
`fastcall`.

I disagree with this slightly. As I've stated previously, I think there
are two distinct semantics here: a) this call must be a notail call, and b)
we'd "like" this to be a no tail, but it's optional. We need to not mix
them.

The semantics of the tail call marker I'm looking for is close to b), but
differs in whether we say it's optional or we guarantee that we block tail
call for a certain subset of calls (direct calls at the source level). I'm
inclined to have the tail call marker mean "notail is guaranteed if the
call is direct at the source level", but I wonder if it'll come back to
bite us.

Also, this tail call marker won't be used to enable optimizations on the
marked function that would be available only if it were known the call to
the function isn't TCO'ed. That might be an interesting idea to other
people, but it's not something that is needed for our use case.

I think this is probably fine, but you need someone with clang/C++ spec knowledge that I don’t have to help with the definition.

I’ve been discussing the clang-side patch and making changes based on the feedback I got for the last few weeks. Aaron has reviewed the patch and he thinks it’s OK now.

http://reviews.llvm.org/D12922

Do you have further comments on the llvm-side patch or the semantics of the function attribute? Since the last time we discussed on the list, I’ve made changes to disallow virtual functions and objective-c methods to be marked not_tail_called, which further reduces unpredictability due to optimization.

I have no further comments. All of my points from previous discussion stand, but it sounds like you have addressed those.

Philip

OK, I’ll wait a few more days so that others can look at and comment on the patch.