devirtualization with new-PM pipeline

Chandler et al,

I have been playing with the new PM pipeline, being particularly interested in how it can handle devirtualization.
Now, I discovered what I believe is a "regression" vs old PM on a rather simple one-translation-unit testcase.

clang is able to devirtualize it with -O3 and fails to do so with -fexperimental-new-pass-manager added.

It looks like a pipeline issue, though I did not dig deeper as I'm not sure if this kind of behavior
is expected at current stage of New PM life or not?

If it is not expected then I'ld be happy to file a bug and do a bit deeper look for the cause.

regards,
Fedor.

Chandler et al,

>
> I have been playing with the new PM pipeline, being particularly interested in how it can handle devirtualization.
> Now, I discovered what I believe is a "regression" vs old PM on a rather simple one-translation-unit testcase.
>
> clang is able to devirtualize it with -O3 and fails to do so with -fexperimental-new-pass-manager added.

I hate to correct myself :wink: yet it is able to devirtualize (that is GVN does its job) though it does fail to perform
a rather expected inline afterwards.

regards,
Fedor.

>
> It looks like a pipeline issue, though I did not dig deeper as I'm not sure if this kind of behavior
> is expected at current stage of New PM life or not?
>
> If it is not expected then I'ld be happy to file a bug and do a bit deeper look for the cause.
>
> regards,
> Fedor.
> -------------------------------
> ] cat devirt.cpp
> struct A {
> virtual int virt1();
> };
> struct B : A {
> int virt1() override {
> return 20;
> }
> };
> static int redirect(A* a) {
> return a->virt1();
> }
> int calc() {
> B b;
> return redirect(&b);
> }
> ] clang++ -O3 devirt.cpp -std=c++11 -S -emit-llvm -o -
> ...
> define i32 @_Z4calcv() local_unnamed_addr #0 {
> entry:
> ret i32 20 <--- nicely devirtualized (&b)->virt1() call
> }
> ...
> ] clang++ -O3 -fexperimental-new-pass-manager -std=c++11 -S -emit-llvm -o -
> ...
> define i32 @_Z4calcv() local_unnamed_addr #0 {
> entry:
> %b = alloca %struct.B, align 8
> %0 = bitcast %struct.B* %b to i8*
> call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0) #3
> %1 = getelementptr inbounds %struct.B, %struct.B* %b, i64 0, i32 0, i32 0
> store i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* @_ZTV1B, i64 0, inrange i32 0, i64 2) to i32 (...)**), i32 (...)*** %1, align 8, !tbaa !2

Yes, this looks broken in the new PM. The DevirtSCCRepeatedPass::run method first scans the functions in the SCC to collect value handles for indirect calls, runs the CGSCC pass pipeline and then checks if any of the call value handles now point to a direct call, in which case it runs the pipeline again (which should inline the devirtualized call) . The problem is scanning the initial SCC for indirect calls is not enough since inlining can produce indirect calls (which could then get devirtualized by later passes). One (ugly) fix is to pass the value handles to the inliner through CGSCCUpdateResult and let the inliner augment it as the caller gets indirect calls due to inlining.

Yes, this looks broken in the new PM. The DevirtSCCRepeatedPass::run method first scans the functions in the SCC to collect value handles for indirect calls, runs the CGSCC pass pipeline and then checks if any of the call value handles now point to a direct call, in which case it runs the pipeline again (which should inline the devirtualized call) . The problem is scanning the initial SCC for indirect calls is not enough since inlining can produce indirect calls (which could then get devirtualized by later passes). One (ugly) fix is to pass the value handles to the inliner through CGSCCUpdateResult and let the inliner augment it as the caller gets indirect calls due to inlining.

Yeah, this is a frustrating tradeoff…

The new PM’s approach here is dramatically simpler, but can’t handle certain edge cases. I’m not sure how much complexity it is worth adding here.

The other way to address this is to not try so hard to exactly match the old PM’s behavior. Instead of trying to detect when an indirect call is turned into a direct call, we could do something more like iterating to a fixed-point, or at least a fixed point of inlining. This is even simpler than the current approach and also more robust (it would handle cases like you describe).

    Yes, this looks broken in the new PM.
    The DevirtSCCRepeatedPass::run method first scans the functions in
    the SCC to collect value handles for indirect calls, runs the
    CGSCC pass pipeline and then checks if any of the call value
    handles now point to a direct call, in which case it runs the
    pipeline again (which should inline the devirtualized call) . The
    problem is scanning the initial SCC for indirect calls is not
    enough since inlining can produce indirect calls (which could then
    get devirtualized by later passes). One (ugly) fix is to pass the
    value handles to the inliner through CGSCCUpdateResult and let
    the inliner augment it as the caller gets indirect calls due to
    inlining.

Yeah, this is a frustrating tradeoff....

The new PM's approach here is dramatically simpler, but can't handle certain edge cases. I'm not sure how much complexity it is worth adding here.

The other way to address this is to not try so hard to exactly match the old PM's behavior. Instead of trying to detect when an indirect call is turned into a direct call, we could do something more like iterating to a fixed-point, or at least a fixed point of inlining. This is even simpler than the current approach and also more robust (it would handle cases like you describe).

Fixed-point approach sounds promising, though it would make sense to allow being flexible on how "fixed" the point is.
As you said, it might be a fixed point of inlining, or a fixed point of some selected set of optimizations critical for
a particular (perhaps, rather customized) pipeline. Allowing to control this fixed point seems to be very reasonable.

I wonder what should be an interface for this kind of control.

regards,
Fedor.