Setting llvm::TargetOptions::GuaranteedTailCallOpt in LTO Code Generation

Hi,

I am lead developer of a project that is using LLVM to implement an ahead-of-time compiled functional language. We use llc -tailcallopt to ensure that functions that end in a tail call are compiled to a tail call at the machine level, because we have a number of cases in our interpreter where functions with different function signatures call one another in deeply nested recursive calls. We can’t use musttail because the callee and caller often have different signatures.

We would like to support link time optimization in our programming language, because performance is important to us. However, there is no clang flag to enable the GuaranteedTailCallOpt flag, and the only way to pass target options to the lto plugin currently is via an unsupported API that parses those flags to static variables. This works on Linux, but the Mac OS linker does not actually initialize the TargetOptions that it passes as an llvm::lto::Config based on the parsed static variables, and Apple is uninterested in spending time supporting an unsupported LLVM API like -mllvm (understandably).

Is there a change to the LLVM project that you might be willing to accept that we might be able to create a patch for that would allow us to get link time optimization enabled in our programming language on Mac OS, at least in the future? And if so, is it possible that someone could give me pointers on how to proceed? I’m a relative novice studying this code and I’m not really sure how all the components fit together at a high level and thus what the correct design for something like this would be.

Or am I going to have to resign myself to waiting until lld is well supported at linking mach-o files?

Thanks,

Hi,

I am lead developer of a project that is using LLVM to implement an ahead-of-time compiled functional language. We use llc -tailcallopt to ensure that functions that end in a tail call are compiled to a tail call at the machine level, because we have a number of cases in our interpreter where functions with different function signatures call one another in deeply nested recursive calls. We can’t use musttail because the callee and caller often have different signatures.

We would like to support link time optimization in our programming language, because performance is important to us. However, there is no clang flag to enable the GuaranteedTailCallOpt flag, and the only way to pass target options to the lto plugin currently is via an unsupported API that parses those flags to static variables. This works on Linux, but the Mac OS linker does not actually initialize the TargetOptions that it passes as an llvm::lto::Config based on the parsed static variables, and Apple is uninterested in spending time supporting an unsupported LLVM API like -mllvm (understandably).

Is there a change to the LLVM project that you might be willing to accept that we might be able to create a patch for that would allow us to get link time optimization enabled in our programming language on Mac OS, at least in the future? And if so, is it possible that someone could give me pointers on how to proceed? I’m a relative novice studying this code and I’m not really sure how all the components fit together at a high level and thus what the correct design for something like this would be.

Or am I going to have to resign myself to waiting until lld is well supported at linking mach-o files?

Thanks,

Hi Dwight,

Welcome to LLVM-dev! A few comments below. Cc’ing a few people who hopefully can add info on some of the specific issues here.

Teresa

Hi,

I am lead developer of a project that is using LLVM to implement an ahead-of-time compiled functional language. We use llc -tailcallopt to ensure that functions that end in a tail call are compiled to a tail call at the machine level, because we have a number of cases in our interpreter where functions with different function signatures call one another in deeply nested recursive calls.

Maybe a naive question - would that be fixable?

We can’t use musttail because the callee and caller often have different signatures.

We would like to support link time optimization in our programming language, because performance is important to us. However, there is no clang flag to enable the GuaranteedTailCallOpt flag, and the only way to pass target options to the lto plugin currently is via an unsupported API that parses those flags to static variables.

I assume you mean passing internal options via -mllvm through the linker?

This works on Linux, but the Mac OS linker does not actually initialize the TargetOptions that it passes as an llvm::lto::Config based on the parsed static variables, and Apple is uninterested in spending time supporting an unsupported LLVM API like -mllvm (understandably).

lto::Config is part of the new LTO API. For the most part ld64 uses the old legacy LTO API, and therefore does not even use llvm::lto::Config (the one exception is to share the code for computing a cache key). But it doesn’t use this when invoking the code generation passes. I’m surprised that ld64 would not have a way to pass through internal llvm options - presumably that is necessary for debugging and tuning. +Steven Wu to give more info here (I work on Linux code and therefore have only directly used gold and lld, which both use the new LTO API).

Is there a change to the LLVM project that you might be willing to accept that we might be able to create a patch for that would allow us to get link time optimization enabled in our programming language on Mac OS, at least in the future? And if so, is it possible that someone could give me pointers on how to proceed? I’m a relative novice studying this code and I’m not really sure how all the components fit together at a high level and thus what the correct design for something like this would be.

I guess the question is what interface would work for you. Would passing an internal option like what works on lld or what you are doing with llc be acceptable?

If you need a more officially supported mechanism, IMO the best way is probably to create a new function attribute (e.g. ‘forcetailcall’ or something equivalent to what GuaranteedTailCallOpt implies). That would be completely linker agnostic and also not rely on internal options.

Or am I going to have to resign myself to waiting until lld is well supported at linking mach-o files?

+Rui Ueyama and
+Eric Christopher to comment on lld Mach-O support.

Teresa

Hi Dwight,

Welcome to LLVM-dev! A few comments below. Cc’ing a few people who hopefully can add info on some of the specific issues here.

Teresa

Hi,

I am lead developer of a project that is using LLVM to implement an ahead-of-time compiled functional language. We use llc -tailcallopt to ensure that functions that end in a tail call are compiled to a tail call at the machine level, because we have a number of cases in our interpreter where functions with different function signatures call one another in deeply nested recursive calls.

Maybe a naive question - would that be fixable?

I doubt that we can get around this easily. It’s a programming language compiler, so the guarantee that if the user writes a tail call in their code, they will get a tail call at the machine level is pretty important. Restricting that guarantee to only functions that call themselves would probably cause a lot of problems, including stack overflows, for code written in our programming language. Recursion is basically the only way to loop in most functional languages; that’s why the tailcallopt flag was created.

We can’t use musttail because the callee and caller often have different signatures.

We would like to support link time optimization in our programming language, because performance is important to us. However, there is no clang flag to enable the GuaranteedTailCallOpt flag, and the only way to pass target options to the lto plugin currently is via an unsupported API that parses those flags to static variables.

I assume you mean passing internal options via -mllvm through the linker?

Yes that’s correct, we are passing -mllvm -tailcallopt to lld on Linux.

This works on Linux, but the Mac OS linker does not actually initialize the TargetOptions that it passes as an llvm::lto::Config based on the parsed static variables, and Apple is uninterested in spending time supporting an unsupported LLVM API like -mllvm (understandably).

lto::Config is part of the new LTO API. For the most part ld64 uses the old legacy LTO API, and therefore does not even use llvm::lto::Config (the one exception is to share the code for computing a cache key). But it doesn’t use this when invoking the code generation passes. I’m surprised that ld64 would not have a way to pass through internal llvm options - presumably that is necessary for debugging and tuning. +Steven Wu to give more info here (I work on Linux code and therefore have only directly used gold and lld, which both use the new LTO API).

ld64 does have an -mllvm flag but when you pass -mllvm -tailcallopt, it will happily parse this flag, but it ignores the resulting value when initializing the code generator, and when I reported the issue to Apple, they said they would not fix it because -mllvm is not an officially supported API.

Is there a change to the LLVM project that you might be willing to accept that we might be able to create a patch for that would allow us to get link time optimization enabled in our programming language on Mac OS, at least in the future? And if so, is it possible that someone could give me pointers on how to proceed? I’m a relative novice studying this code and I’m not really sure how all the components fit together at a high level and thus what the correct design for something like this would be.

I guess the question is what interface would work for you. Would passing an internal option like what works on lld or what you are doing with llc be acceptable?

Yes, this would be fine with us, if it’s possible. How would I go about making this happen?

Hi Dwight

Thanks for the feedback. For the issue you reported, there has been few reviews trying to tweak the -mllvm option when using legacy LTO interfaces (myself included) but it never got enough traction to moving forward. Note how -tailcallopt is implemented as a -mllvm flag means that it is a debug option and probably not well tested. The option is also not stable which means it can be renamed without notification.

I also feel like passing -tailcallopt in the linker stage is kind of fragile. It is better to create an attribute (on function or callInst) to force tailcallopt and some compiler flag to generated that during IRGen.

Steven

Hi Dwight

Thanks for the feedback. For the issue you reported, there has been few reviews trying to tweak the -mllvm option when using legacy LTO interfaces (myself included) but it never got enough traction to moving forward. Note how -tailcallopt is implemented as a -mllvm flag means that it is a debug option and probably not well tested. The option is also not stable which means it can be renamed without notification.

Digging out the patch Teresa once pointed out to me: https://reviews.llvm.org/D19015
It is really a two line change. If you build your own libLTO, it is a very patch to maintain downstream.

I also feel like passing -tailcallopt in the linker stage is kind of fragile. It is better to create an attribute (on function or callInst) to force tailcallopt and some compiler flag to generated that during IRGen.

I think I missed you comments about musttail. Do you have any example to show why musttail doesn’t work for you? Is there anything we can do to make it work so we don’t need to rely on -mllvm options?

Steven

Hi Dwight

Thanks for the feedback. For the issue you reported, there has been few reviews trying to tweak the -mllvm option when using legacy LTO interfaces (myself included) but it never got enough traction to moving forward. Note how -tailcallopt is implemented as a -mllvm flag means that it is a debug option and probably not well tested. The option is also not stable which means it can be renamed without notification.

Digging out the patch Teresa once pointed out to me: https://reviews.llvm.org/D19015

Ah - I had completely forgotten about that - directly related to this situation.

It is really a two line change. If you build your own libLTO, it is a very patch to maintain downstream.

This could be a good short term change, but I think as Steven and I both mentioned, the right long term approach is to use function attributes. Steven raises a good question about this below. I am not that familiar with the tail call optimization pass so can’t add much to that part of the discussion.

Teresa

Hi Dwight

Thanks for the feedback. For the issue you reported, there has been few reviews trying to tweak the -mllvm option when using legacy LTO interfaces (myself included) but it never got enough traction to moving forward. Note how -tailcallopt is implemented as a -mllvm flag means that it is a debug option and probably not well tested. The option is also not stable which means it can be renamed without notification.

Digging out the patch Teresa once pointed out to me: https://reviews.llvm.org/D19015
It is really a two line change. If you build your own libLTO, it is a very patch to maintain downstream.

Thanks for linking this to me! I will try and see if I can get this to work because it might be the simplest short term solution so we can have something working. It is not ideal as a long term solution because I really don’t want us to have to maintain a build of any of the components of llvm ourselves though. So I will respond below to your question and we can figure out the best long term solution.

I also feel like passing -tailcallopt in the linker stage is kind of fragile. It is better to create an attribute (on function or callInst) to force tailcallopt and some compiler flag to generated that during IRGen.

I think I missed you comments about musttail. Do you have any example to show why musttail doesn’t work for you? Is there anything we can do to make it work so we don’t need to rely on -mllvm options?

Steven

The problem with musttail is that the behavior of the feature is that IR does not verify if it includes a musttail call where the caller and callee have different numbers of arguments or otherwise differ in certain respects. However, we need guaranteed tail calls for mutually recursive functions, which obviously may not have the same signature. I would love to be able to use musttail though. Maybe you could make musttail functions with a compatible calling convention use the same codepath as -tailcallopt, and then loosen the restrictions? I’m not really sure. I can foresee there might be problems with calling convention if the function was externally visible, but for our use case it should be fine if the looser musttail attribute only worked for functions local to a module, I think…

I also feel like passing -tailcallopt in the linker stage is kind of fragile. It is better to create an attribute (on function or callInst) to force tailcallopt and some compiler flag to generated that during IRGen.

I think I missed you comments about `musttail`. Do you have any example to show why `musttail` doesn’t work for you? Is there anything we can do to make it work so we don’t need to rely on `-mllvm` options?

Steven

The problem with musttail is that the behavior of the feature is that IR does not verify if it includes a musttail call where the caller and callee have different numbers of arguments or otherwise differ in certain respects. However, we need guaranteed tail calls for mutually recursive functions, which obviously may not have the same signature. I would love to be able to use musttail though. Maybe you could make musttail functions with a compatible calling convention use the same codepath as -tailcallopt, and then loosen the restrictions? I'm not really sure. I can foresee there might be problems with calling convention if the function was externally visible, but for our use case it should be fine if the looser musttail attribute only worked for functions local to a module, I think...

We could extend the circumstances under which we allow musttail, sure. But we could only allow arbitrary mismatched argument lists with specific calling conventions where we know it’s actually possible to lower them. And currently, there is no such convention. (Well, technically there’s x86_stdcall, but that only works on 32-bit x86. And the GHC and HiPE calling conventions are weird in other ways.)

We could add a new calling convention that’s equivalent to “fastcc with GuaranteedTailCallOpt”, though, and give it special musttail rules. Maybe call it “tailcc”.

-Eli

What you suggest (a new tail-callable calling convention equivalent to fastcc with GuaranteedTailCallOpt) sounds like a solution that would work for us. But it also sounds like something significant enough in scope that it might be difficult for me to complete myself, when I am barely familiar with the codebase. I feel confident I could probably add a new function attribute that is equivalent to specifying GuaranteedTailCallOpt=true if it is present on both the callee and caller functions, but I doubt I have the skills necessary to create an entirely new calling convention AND modify the musttail semantics to have knowledge of it. Is this something someone would actually be willing to work on or at the very least help me figure out? Or is the former solution also considered viable?

We could extend the circumstances under which we allow musttail, sure. But we could only allow arbitrary mismatched argument lists with specific calling conventions where we know it’s actually possible to lower them. And currently, there is no such convention. (Well, technically there’s x86_stdcall, but that only works on 32-bit x86. And the GHC and HiPE calling conventions are weird in other ways.)

We could add a new calling convention that’s equivalent to “fastcc with GuaranteedTailCallOpt”, though, and give it special musttail rules. Maybe call it “tailcc”.

-Eli

What you suggest (a new tail-callable calling convention equivalent to fastcc with GuaranteedTailCallOpt) sounds like a solution that would work for us. But it also sounds like something significant enough in scope that it might be difficult for me to complete myself, when I am barely familiar with the codebase. I feel confident I could probably add a new function attribute that is equivalent to specifying GuaranteedTailCallOpt=true if it is present on both the callee and caller functions, but I doubt I have the skills necessary to create an entirely new calling convention AND modify the musttail semantics to have knowledge of it. Is this something someone would actually be willing to work on or at the very least help me figure out? Or is the former solution also considered viable?

Adding a function attribute that changes the meaning of fastcc would produce a consistent result, but it’s less than ideal in other respects. It works badly if you try to mix functions with different target attributes. (See https://reviews.llvm.org/D53554 for the sort of issue I’m talking about.) You lose access to the normal fastcc, which is generally faster for non-tail calls. And it becomes impractical to relax musttail checking the way I proposed, so you’re depending on a best-effort lowering; certain transforms will clear the “tail” flag, like ArgumentPromotion.
Adding a new calling convention isn’t really as hard as it sounds; it’ll mostly be reusing existing code. Try to grep for, for example, X86_ThisCall. You basically only need to modify IR serialization and deserialization, and a few places in target-specific code for each target that supports the convention.
-Eli