question on difference of bitcode between C and C++

Could anybody provide me some links or pages or infos of the difference of bitcodes of C and C++? We have implemented an optimization pass on bitcode generated from C, and we are trying to find out whether it will work on bitcode from C++. Thanks!

Hi Fei,

There isn't a difference in the bitcode format of a C as opposed to a C++ program. There are differences in how functions are linked, but that's a function of the language and not the bitcode. E.g., if you want to call a C function from C++, you need to do this:

extern "C" {
  void foo();
}

void bar() {
  foo();
}

-bw

Hi Bill,

Thanks for reply! I am sorry I didn’t express my question clearly.

Examples may explain well. Now I am trying to analyze the data flow of programs. I first compile the C code to bitcode, and then apply our algorithm to the bitcode to find the dependency between statements. But as to C++ code, there are class, vector, reference, I may need to revise my algorithm to analyze bitcode from C++.

I am trying to find such difference of bitcode between C and C++.

There isn't any difference in that sense... in IR, a constructor is
just a function call, a reference is just a pointer, etc.

-Eli

Hi Fei,

While Clang (like others) lowers C++ into C semantics and lower that
into IR, there are some changes that exist in C++ and not in C. The IR
has the same features, but some assumptions on the semantics are
different.

I can give you two examples:

1. Classes in C++ are like C structures in IR, but the C++ ABI makes
it difficult to express Base/Derived classes in pure structures.
(http://www.systemcall.org/blog/2011/01/cpp-class-sizes/ and
http://www.systemcall.org/blog/2011/03/cpp-class-sizes-2/).

So, if your pass depends on identifying the same types, you could end
up thinking that the types are different, but they're not. They're
just different struct representations (base vs. derived) of the same
type.

2. Virtual table tables encode offsets in two different ways:
addresses and offsets, and the two representations are normally on the
same global static array in IR. So, while the type of the array is int
(or pointer), it contains addresses and offsets from addresses
bitcasting to the type of the global.

These are not the only C-lowering that C++ front-ends do, but it gives
you a taste of what to expect. As Eli said, most of C++ can be lowered
into C-like structures and the IR is very similar, but some semantic
interpretations are done differently, and the internal IR (that deals
with those objects) is slightly different.

Another difference is the presence of exceptions in C++, which would require you to handle more IR instructions. This might not matter depending on type of analysis you do.
See: http://llvm.org/docs/LangRef.html#i_invoke
(Note that there is a substantial rewrite of exception handling going into 3.0)

Anna.

Thanks! Your suggestions are really helpful to me. I am verifying the differences and will provide my findings soon.