Devirtualization of calls

Hello,

Following a question on SO about how MSVC handled the devirtualization of calls, I tried the following in Clang to check its behavior:

struct A { virtual void foo(); };

struct C { A a; };

C& GetRef();

void devirtualizeDirect() {
A a;
a.foo();

C c;
c.a.foo();
}

void devirtualizeRef() {
A a;
A& ar = a;
ar.foo();

C c;
C& cr = c;
cr.a.foo();
}

void nodevirtualize() {
C& cr = GetRef();
cr.a.foo();
}

Here is the IR emitted by the Try Out LLVM page (for the 3 functions of interest):


define void @devirtualizeDirect()() {
  %a = alloca %struct.A, align 8
  %c = alloca %struct.C, align 8
  %1 = getelementptr inbounds %struct.A* %a, i64 0, i32 0
  store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* @vtable for A, i64 0, i64 2) to i32 (...)**), i32 (...)*** %1, align 8
  call void @A::foo()(%struct.A* %a)
  %2 = getelementptr inbounds %struct.C* %c, i64 0, i32 0, i32 0
  store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* @vtable for A, i64 0, i64 2) to i32 (...)**), i32 (...)*** %2, align 8
  %3 = getelementptr inbounds %struct.C* %c, i64 0, i32 0
  call void @A::foo()(%struct.A* %3)
  ret void
}

define void @devirtualizeRef()() {
  %a = alloca %struct.A, align 8
  %c = alloca %struct.C, align 8
  %1 = getelementptr inbounds %struct.A* %a, i64 0, i32 0
  store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* @vtable for A, i64 0, i64 2) to i32 (...)**), i32 (...)*** %1, align 8
  call void @A::foo()(%struct.A* %a)
  %2 = getelementptr inbounds %struct.C* %c, i64 0, i32 0, i32 0
  store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* @vtable for A, i64 0, i64 2) to i32 (...)**), i32 (...)*** %2, align 8
  %3 = getelementptr inbounds %struct.C* %c, i64 0, i32 0
  call void @A::foo()(%struct.A* %3)
  ret void
}

define void @nodevirtualize()() {
  %1 = tail call %struct.C* @GetRef()()
  %2 = getelementptr inbounds %struct.C* %1, i64 0, i32 0
  %3 = bitcast %struct.C* %1 to void (%struct.A*)***
  %4 = load void (%struct.A*)*** %3, align 8
  %5 = load void (%struct.A*)** %4, align 8
  tail call void %5(%struct.A* %2)
  ret void
}

As expected Clang successfully devirtualize the call in the first example, and even succeeds in the second which MSVC didn’t.

However it fails to handle the 3rd case (or so I assume from the pointer meddling), which seems very much like the second (cr.a is necessarily a A whatever the dynamic type of cr).

I therefore have 2 questions:

1/ I tried unsuccesfully to obtain the LLVM IR on my own PC (using MSYS), however clang refuses to emit it:

$ clang -emit-llvm devirtualize.cpp -o devirtualize.o
clang: error: ‘i686-pc-mingw32’: unable to pass LLVM bit-code files to linker

How could I get the LLVM IR ? (under textual representation, but I can use llvm-dis if I get bytecode I think)

2/ Does anyone have any inkling as to where the devirtualization occur in Clang ? I’d like to have a look but for now I never explored past Sema and this seems something more like CodeGen.

Thanks :slight_smile:

– Matthieu

Hello

$ clang -emit-llvm devirtualize.cpp -o devirtualize.o
clang: error: 'i686-pc-mingw32': unable to pass LLVM bit-code files to
linker

How could I get the LLVM IR ? (under textual representation, but I can use
llvm-dis if I get bytecode I think)

You forgot to add -c cmdline option

2011/9/3 Anton Korobeynikov <anton@korobeynikov.info>

Hello

$ clang -emit-llvm devirtualize.cpp -o devirtualize.o
clang: error: ‘i686-pc-mingw32’: unable to pass LLVM bit-code files to
linker

How could I get the LLVM IR ? (under textual representation, but I can use
llvm-dis if I get bytecode I think)

You forgot to add -c cmdline option


With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University

D’oh!

Well, I can now confirm that the issue still exists with a recent snapshot (from today):

$ clang -emit-llvm -O2 -c devirtualize.cpp -o - | llvm-dis
define void @_Z14nodevirtualizev() {
entry:
%call = tail call %struct.C* @_Z6GetRefv()
%a = getelementptr inbounds %struct.C* %call, i32 0, i32 0
%0 = bitcast %struct.C* %call to void (%struct.A*)***
%vtable = load void (%struct.A*)*** %0, align 4
%1 = load void (%struct.A*)** %vtable, align 4
tail call void %1(%struct.A* %a)
ret void
}

Thanks Anton!

– Matthieu

Hello,

I have been trying to pinpoint the place where the direct/virtual call was decided, however I must admit I failed.

I have tracked down the two following functions:

canDevirtualizeMemberFunctionCalls in CodeGen/CGExprCXX.cpp:108
canDevirtualizeMemberFunctionCall in CodeGen/CGClass.cpp:1625

which are nigh identical (and I do wonder why there are two of them, with the exact same comments).

However, even though the first is used, it systematically returns “false” (bottom of the function, after all checks have failed) whether or not the call ends up devirtualized in my test cases.

I was wondering if this optimization, therefore, was realized at IR level (LLVM realizing that the vptr is known statically and thus dereferencing it automatically…, but it does not really looks so).

I would appreciate if anyone could direct me to the appropriate portion of the code.

– Matthieu