should AlwaysInliner inline this case?

%294 = call float bitcast (float (float, float*)* @__gpu_modff to float (float, i64)*)(float %293, i64 %preg.212.addr.0)

as you may know, some gpu backends don’t support function call. we need to make sure to inline all functions here. however, Inliner can not figure out that this is a valid callsite in this form. actually, it is. in C words, cast a function and then call should be treat as callsite, right?

thanks,

–lx

%294 = call float bitcast (float (float, float*)* @__gpu_modff to float (float, i64)*)(float %293, i64 %preg.212.addr.0)

as you may know, some gpu backends don’t support function call. we need to make sure to inline all functions here. however, Inliner can not figure out that this is a valid callsite in this form. actually, it is. in C words, cast a function and then call should be treat as callsite, right?

LLVM’s inliner does not support indirect calls. I have encountered the same problem before, but instead of adding support for indirect call, it might be easier to modify the code to generate the accurate function signature and thus avoid the explicit cast.

thanks,
chen

Generally, the inliner doesn't do much with indirect calls, but given there is no simpler canonical case here, I expect we'll have to.

Its possible we might even want to define this as a direct call. I'm not sure what the expectations are with regards to the type of the function being called and the type of the callsite. I suspect a lot of code would get confused if getCalledFunction returned __gpu_modff with it's unbitcast type. That's possibly something we should fix though.

We'll want to get other folks input here, but a small patch to the inliner to handle this case would seem reasonable to me.

Philip

Did you intend to reply only to me on this?

Oops.

Maybe we can say that casting the arguments should work, or else UB. Note that we’ve had problems here in the past with byval.

I’m not quite sure what you meant by this. Can you clarify specifically what “should work” means?

Just that bitcasting the caller arg types to the callee’s parameter types and removing the indirect call should be semantically equivalent.

For that matter, it’d be good to understand if this call by itself is UB, or if similarly casting an i32 to float this way is valid.

It is definitely legal to express such a cast in the IR. Are you suggesting that the cast should be of the argument, not of the function pointer type? Disallowing function pointer casts would break a lot of idiomatic C code. (I’m purposely not making any claim about the UB vs non-UB nature of such idiomatic code.)

By “valid” I just mean “has defined behavior”. I’m pretty sure C says that regardless of the casts you do, you ultimately have to call using a matching prototype. The initial IR looks like a violation of that rule.

Philip,

I post here because I think AlwaysInliner should inline it. I want to detect the indirect calls for Inliner, and I want to hear inputs.

let me define indirect call first in my idea. In one single expression, one function may be subject to bitcast more than one time. we can detect this situation and treat it as a regular call of last function, is that okay?

thanks,
–lx

Hi lx, Philip

I’ve seen an instcombine which helps with this situation. It fires when the function types on both sides of the bitcast have the same number of operands and compatible types. It then adds bitcasts on the arguments and removes the one on the called function.

I don’t have IR to hand, but it would be worth passing your IR through instcombine to see if that helps you.

The idea of improving the inliner is also great, but you may find that it’s needed for cases other than this one if i’m right about the instcombine.

Thanks
Pete

Hi lx, Philip

I've seen an instcombine which helps with this situation. It fires when
the function types on both sides of the bitcast have the same number of
operands and compatible types. It then adds bitcasts on the arguments and
removes the one on the called function.

It indeed does, InstCombiner::transformConstExprCastCall.

I don't have IR to hand, but it would be worth passing your IR through
instcombine to see if that helps you.

The following should be a sufficiently workable example of what we would
hope to transform.

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define i32 @g(i32* %a) {
entry:
  %call = tail call i32 bitcast (i32 (i64)* @f to i32 (i32*)*)(i32* %a)
  ret i32 %call
}

declare i32 @f(i64)

define i32 @h(i64 %a) {
entry:
  %call = tail call i32 bitcast (i32 (i32*)* @g to i32 (i64)*)(i64 %a)
  ret i32 %call
}

The idea of improving the inliner is also great, but you may find that
it's needed for cases other than this one if i'm right about the
instcombine.

Sadly, the combine fails because InstCombine
queries CastInst::isBitCastable to determine the castable-ness of the
parameter type and the argument type. It isn't bitcastable though, it's
ptrtoint/inttoptr castable.

The following patch opens up the optimization:
--- a/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -1456,7 +1456,7 @@ bool
InstCombiner::transformConstExprCastCall(CallSite CS) {
     Type *ParamTy = FT->getParamType(i);
     Type *ActTy = (*AI)->getType();

- if (!CastInst::isBitCastable(ActTy, ParamTy))
+ if (!CastInst::isBitOrNoopPointerCastable(ActTy, ParamTy, DL))
       return false; // Cannot transform this parameter value.

     if (AttrBuilder(CallerPAL.getParamAttributes(i + 1), i + 1).
@@ -1551,7 +1551,7 @@ bool
InstCombiner::transformConstExprCastCall(CallSite CS) {
     if ((*AI)->getType() == ParamTy) {
       Args.push_back(*AI);
     } else {
- Args.push_back(Builder->CreateBitCast(*AI, ParamTy));
+ Args.push_back(Builder->CreateBitOrPointerCast(*AI, ParamTy));
     }

     // Add any parameter attributes.

Running opt -instcombine -inline -instcombine with this patch results in:
define i32 @g(i32* %a) {
entry:
  %0 = ptrtoint i32* %a to i64
  %call = tail call i32 @f(i64 %0)
  ret i32 %call
}

declare i32 @f(i64)

define i32 @h(i64 %a) {
entry:
  %call.i = tail call i32 @f(i64 %a)
  ret i32 %call.i
}

This is exactly why I added those casting utilities. With sufficient
testing, this patch LGTM as a way to effectively canonicalize these cases.
Need to make sure this handles all the cases of ptr->int and int->ptr with
too large, too small, and just right integer sizes. If all goes well, this
seems like a really nice improvement.

hi, Pete,

I understand InstCombine may simplify bitcast to nothing, but the order matters. AlwaysInliner happens in very early stage, and we never call it again. if InstCombine works, can we invoke it before Inliner?

thanks,
–lx

hi, Pete,

I understand InstCombine may simplify bitcast to nothing, but the order matters. AlwaysInliner happens in very early stage, and we never call it again. if InstCombine works, can we invoke it before Inliner?

I don’t know the order for sure, but I know instcombine runs a few times so probably one of those is before alwaysinline.

Arguably, if the pass order is an issue, I’d be fine with making the code David found be a utility that the inliner can call. Not sure if anyone else thinks that’s out of scope for the inliner, but I like the idea of it being able to make a call more inlinable.

Thanks
Pete

Nice! Can I request we add addrspacecast to the list too. It’s probably in the utilities you mention but just needs a test case for this new use of the utility.

Pete

Should be done in r225254.

Honestly, I'm not sold on this one.

always_inline is a really murky thing to begin with. I think the only
reasonable stance is to insist that if the frontend *really* needs the
inline to occur, it should arrange for the inline to be trivially do-able.
Relying on the always inliner (much less other passes) having any kind of
folding in order for the inlining to occur seems really risky.

I'd actually like it if we could spec sufficiently restrictive rules, and
make it a verify failure to violate them on a call marked always_inline so
that frontends were required to diagnose this nicely to users or satisfy
the constraints. Sadly, we are a very long way away from realizing a system
like that. I think we should probably move in that direction though, even
if it takes a long time to get there, as other patterns seem really hard to
maintain and support long term.

Fair enough. I wasn’t hugely attached to the idea, just wanted to put it out there :slight_smile:

This would be nice. For this to work, you basically need always_inline to be the first thing that ever runs in the pass pipeline. Only if its first can the analysis run by the front-end still hold. Otherwise the optimizer might transform a call site in to something no longer inalienable. Or perhaps that would be considered an opt bug. it could get especially tricky with LTO if you have say all of -O3 then try to always_inline across linked modules.

Pete