Understanding tail call

Hi friendly LLVM Devs,

I’m trying to understand the technical details of tail call optimization, but unfortunately I hit some issues that I couldn’t figure out myself.

So I tried the following two really simple functions:

extern void g(int*);
void f1() {
int x;
g(&x);
}
void f2(int* x) {
g(x);
}

It turns out that ‘f1’ does not have tail call optimization, but ‘f2’ does.

My current really naive understanding is that a call happening as the last statement of the function can be optimized as a tail call. However, why cannot ‘f1’ do this?

Is there any recommended articles that I should read if I were to understand the internals of tail call optimization in LLVM, or more generally, performance optimization on reducing function calling overhead? I know there are a bunch of different calling conventions, optimized for different use cases, but I don’t know any of the details. Can someone suggest a good introductory material that I should read to understand what are the pros and cons of each calling convention, and what should be the best calling convention for a given use case?

Thanks!

Best,
Haoran

Hi Haoran,

non-cdecl calling conventions are highly architecture dependent. You are likely better off instructing LLVM to inline functions by choosing a higher optimization level (-O3) and Link Time Optimizations (-flto=thin)

Hi Charlotte,

Thanks for your reply!

There are some more restrictions to tail calls, namely it needs to be possible to restructure the function in such a way that the function epilogue has already been run. Local variables are scoped and go out of scope at the end of the function. The function epilogue is responsible for restoring the stack and (in C++) running the destructors of stack variables.

I understand it now. So basically the restriction is that f() needs to be able to restore the stack to “as if” it has finished running, so “g”'s epilogue could clean up as if “f” has finished running, which means, no variables living on the stack can survive at the time the tail call to ‘g’ happens.

non-cdecl calling conventions are highly architecture dependent. You are likely better off instructing LLVM to inline functions by choosing a higher optimization level (-O3) and Link Time Optimizations (-flto=thin)

What you said is definitely true, but my use case is kind of special so inlining is not possible and also the functions are pretty small so likely the calling conventions are taking a significant amount of time, which is why I would like to understand better on the possible alternative options (and which is also why I haven’t digged into it before, because what you said is for sure the solution for 99% use cases :smile:). I have read the FTL javascript JIT was using non-cdecl calling convention for performance, which is why I wanted to understand more. Is there any good introduction article you could recommend?

Thanks!

Best,
Haoran

Charlotte Delenk via llvm-dev <llvm-dev@lists.llvm.org> 于2020年9月25日周五 上午4:06写道: