Structure Padding and GetElementPtr

Hi all,

I’m writing a pass to understand the memory access to C++ class members. For each GetElementPtr instruction, I check the second index to the class pointer, to figure out which member it is intended to access.

However, due to the structure padding, there are some fake members inserted into the structure. For example, when GEP works on the 5th element of the padded structure, it may in fact works on the originally 3rd one, if there are two paddings before the original 3rd member.

Is there any way to map this “5th” access to the original “3rd” one? For example, some APIs to tell whether one member is a real member, or a padded one?

I would suggest converting the index of the struct GEP into an offset in bytes; see StructLayout::getElementOffset. You can then compare that to the layout of the original C++ class. -Eli

Thanks, Eli.

Next question is how to get the layout of the original C++ class from LLVM IR?

Only Clang really knows the original structure layout. You can pass ‘-Xclang -fdump-record-layouts’, though, to see the layout during compilation. The DICompositeType metadata produced when compiling with debug info might contain enough information to describe the original layout.

Here is an example:

I can define two classes: A and Apad:

class A {
bool b1, b2;
double d1;
int i1;
};

class Apad {
bool b1, b2;
bool pad1[6];
double d1;
int i1;
bool pad2[4];
};

A and Apad will have the same layout, from the LLVM IR level:

%class.A = type <{ i8, i8, [6 x i8], double, i32, [4 x i8] }>

%class.Apad = type { i8, i8, [6 x i8], double, i32, [4 x i8] }

Yes, Reid. I have used these methods to figure out the layout.

Now my question is to build a map between the original layout and the new layout. I show one example below. When LLVM IR access the 4th (starting from 0th) member (i32) of the class A, the map will tell that in fact it is accessing the originally 3rd member (i1). Any suggestion?

Add the example:

class A {
bool b1, b2;
double d1;
int i1;
};

%class.A = type <{ i8, i8, [6 x i8], double, i32, [4 x i8] }>

Yes.
LLVM types != C++ types.
There is no mapping except that produced if you add debug info.

Thanks all. Finally find a way to do the mapping.

In the metadata, each element (member in C++) has the offset field, which is the offset in the new layout. With this field, I can match each index used by GetElementPtr with the original member.