Hi all,
I've added tail call optimization to x86. This is different from what -tailcallopt does, which forces fastcc function to be tail callable. My changes detect opportunities to do tail call without having to change the ABI.
I've looked at the codegen of -tailcallopt and it doesn't look all that good. Running it as a llcbeta option shows it significantly pessimize code in most cases.
As far as I can tell only PPC and X86 targets are supporting this option. Does anyone actually using it? I'd prefer to just remove it to clean up the implementation if no one has any objections.
Evan
Right now, -tailcallopt is the way to get guaranteed tail calls.
For projects which need this, it's not an optimization; it's a
correctness requirement.
Dan
Does anyone actually using it?
Yes, many LLVM-based projects rely upon TCO to work correctly.
I'd prefer to just remove it to clean up the implementation if no one has
any objections.
Are you saying that you want to remove LLVM's working TCO and replace it with
something that is faster but broken?
I think you may have misunderstood what TCO is and why users want it. TCO
allows an unbounded number of calls in tail position to use only a bounded
amount of stack space. Replacing branches within functions with tail calls
generalizes loops and makes them arbitrarily extensible. Consequently, many
language implementations (particularly functional languages) rely upon TCO to
implement loops. So breaking TCO literally means breaking loops for all of
those projects.
You can imagine the reaction you would get if you went to the Clang guys and
advocated a new "for" loop because it was faster but segfaulted every 10,000
iterations!
Moreover, working TCO is one of the features that sets LLVM apart from its
competitors. The JVM doesn't provide TCO. Mono's implementation of TCO is
broken. Microsoft's .NET is one of the only alternatives to provide working
TCO and even it has more restrictions than LLVM does.
The performance of tail calls on LLVM is a minor concern for me and I would
appreciate it being optimized but certainly not at the cost of correctness.
Does anyone actually using it?
Yes, many LLVM-based projects rely upon TCO to work correctly.
Ok, that's all I need to know.
I'd prefer to just remove it to clean up the implementation if no one has
any objections.
Are you saying that you want to remove LLVM's working TCO and replace it with
something that is faster but broken?
No, I'd rather have something that's working and helps performance.
Evan
Evan Cheng wrote:
As far as I can tell only PPC and X86 targets are supporting this option. Does anyone actually using it? I'd prefer to just remove it to clean up the implementation if no one has any objections.
Don't know whether that is the same, but my Pure compiler sets
llvm::PerformTailCallOpt. Pure needs TCO because it doesn't have any
built-in looping constructs. In fact, most functional language
implementations will rely on TCO. If you can make this work reliably on
all supported platforms without needing special flags, that would be
welcome. Otherwise please keep the flag.
(That said, last time I tried LLVM 2.7svn, TCO was broken in the JIT, at
least on x86_64. It works fine up to LLVM 2.6 for me, though.)
Albert
I should also mention that tail call optimization is an optimization in space
and not time: it typically degrades performance (e.g. 2x slower on .NET).
If you want to improve the performance of tail calls you could either hack the
existing TCO implementation to make it generate more efficient code, or
circumvent it when you spot a special case (e.g. direct tail call to self)
that can be optimized.
I've written a new back-end for the GHC Haskell compiler which is at
the stage of being able to bootstrap GHC itself. I use tailcallopt for
all code compiled and am strongly against removing it.
Thanks for improving this! Since no good deed can go unpunished, could
you also update the LangRef and codegen documents to describe the new
state of the world? It might also be nice to mention there that tail
calls can be needed for correctness in addition to performance and how
users can guarantee that a call will be transformed, since both if us
have forgotten recently. I can take a stab at the second one if you
don't have time.
Thanks,
Jeffrey
I am somewhat surprised people are actually using TCO. I had to fixed a number of subtle bugs to get it working and even now I am not too happy with it. My focus was on finding non-ABI changing automatic tail call cases (aka gcc's sibcall). It's now done so I'll leave -tailcallopt alone for now.
I'll run -tailcallopt as x86 llcbeta to see if JIT is indeed broken.
Evan
Thanks for improving this! Since no good deed can go unpunished, could
you also update the LangRef and codegen documents to describe the new
state of the world? It might also be nice to mention there that tail
I plan to do this part before 2.7.
calls can be needed for correctness in addition to performance and how
users can guarantee that a call will be transformed, since both if us
have forgotten recently. I can take a stab at the second one if you
don't have time.
I don't really know enough about how TCO is used by Pure, Haskell, etc. So if you could write this part, I'd appreciate it.
Evan
Basically, these languages need to be able to make guarantees that all calls that can be TCO’d will be TCO’d. Consider that in most functional languages, the standard way to iterate over the list is a tail-recursive function. In order to avoid stack overflows in almost trivial programs, they must guarantee that TCO will be applied as aggressively as possible.
–Owen
Evan Cheng wrote:
I'll run -tailcallopt as x86 llcbeta to see if JIT is indeed broken.
Jeffrey apparently fixed that now, I'm going to test against trunk asap.
Albert
My tests show x86_64 -tailcallopt JIT working fine on Mac OS X.
Evan
Evan Cheng wrote:
My tests show x86_64 -tailcallopt JIT working fine on Mac OS X.
Linux x86_64 works for me, too.
I've added tail call optimization to x86. This is different from what -tailcallopt does, which forces fastcc function to be tail callable. My changes detect opportunities to do tail call without having to change the ABI.
+1 for opportunistically optimizing tail calls in the "ccc" convention during x86 assembly generation. This is a desirable feature.
Does anyone actually using it? I'd prefer to just remove it to clean up the implementation if no one has any objections.
-1 for removing the guaranteed tail call feature of the "fastcc" convention. As others have noted, there are several projects (mine included) that require the "tail" call site qualifier in the IR language.