Noncapture use of locals disabling TailRecursionElimination

Hi,

I was looking into the implementation of TailRecursionElimination, and
noticed that we have the constrain that if any call uses a local, even
though it doesn't capture the local, it would still prohibit TCE. This
contain seems unnecessary and overly limiting? Relevant code is here:

Looking through the tests that cover this scenario
(llvm-project/basic.ll at e29874eaa04d24b8c67776bf5729474d671a58f6 · llvm/llvm-project · GitHub),
I found it referring to rdar://14324281 and PR962. What are they?
These haven't been updated since 2014, so I wonder what is the latest
state and have them been resolved?

cc nicholas who authored the original code.

Thanks!

Hi,

I was looking into the implementation of TailRecursionElimination, and
noticed that we have the constrain that if any call uses a local, even
though it doesn't capture the local, it would still prohibit TCE. This
contain seems unnecessary and overly limiting?

I think it's a necessary limitation. The idea is that something marked 'tail' could be emitted as a jump instruction instead of a call instruction. In order to do that, the jumpee has to 'ret' to the same place our function would have returned to. Since the return value is the current top-of-the-stack value when returning[1], the jumper must remove any local variables -- in LLVM, allocas -- from the stack before jumping. This implies that the jumpee may not access our locals at all, it is not sufficient for the callee to merely not capture them.

Further to that point, we have an optimization in BasicAA where we conclude that a call marked "tail" does not alias any allocas of the calling function.

There's a missed optimization here regarding "notail". Because "tail" can be used as an optimization for BasicAA, we should actually permit functions to be marked "tail" (not accessing any of the callers stack) and still be marked "notail" (the machine code must be a call, not a jump). These aren't actually mutually exclusive, even though they sound like they are.

Nick

[1] - not true on Windows, but even there the number of bytes deep the return value is, is a fixed constant that is determined entirely by the callee. The jumper can't have extra bytes on the stack at the moment of the jump.

Are you specifically asking about AllCallsAreTailCalls? I agree that's overly limiting.

If you're going to propose a patch to change it, you probably want to be careful that we don't introduce dynamic allocas.

(PR962 is https://bugs.llvm.org/show_bug.cgi?id=962 . rdar is a reference into the Apple internal bugtracker.)

-Eli

Eli,
Yes I was referring to AllCallsAreTailCalls. I will take a look at how
to improve this.

Nick,
Thanks. I agree that's the proper constrain to mark a call as
tailcall, however not being able to mark a call as tailcall shouldn't
completely kill TCE. (i.e. AllCallsAreTailCalls seems overly
limiting).

Specifically, I have been looking into an issue where Clang cannot TCE
the following code:

class Foo {
public:
  __attribute__((noinline))
  ~Foo() {}
};

void callee() {
  Foo f;
}

void funcRecurse() {
  callee();
  funcRecurse();
}

int main() {
  funcRecurse();
}

In the above code, callee will be inlined into funcRecurse, generating
calls to ~Foo() along with lifetime management intrinsics, all using
locals, disabling TCE. Clang can successfully TCE if we force callee
to not inline.

Eli,
Yes I was referring to AllCallsAreTailCalls. I will take a look at how
to improve this.

Nick,
Thanks. I agree that's the proper constrain to mark a call as
tailcall, however not being able to mark a call as tailcall shouldn't
completely kill TCE. (i.e. AllCallsAreTailCalls seems overly
limiting).

I get it now. TailRecursionElimination.cpp does two optimizations, marking of tail and converting recursion to loops. I thought you were proposing a change to the marking of tail. Thanks for the example!

"PR962" refers to "problem report 962" or https://llvm.org/PR962 . "rdar" is Apple's "radar" bug tracking system, these are generally internal to their company but sometimes they're available in whole or in part on https://openradar.appspot.com/ .

Nick

I see.
Thank you Nick for the detailed explanations! I understand it now.