Offset to C++ structure members

Given the C++ struct:

  struct S {
    string a;
    string b;
  };

I also have C "thunk" functions that I call from LLVM code:

  // calls S::S()
  void* T_S_M_new( void *heap );

  // call string::assign(char const*)
  void T_string_M_assign_Pv( void *that, void *value );

I want to do the LLVM equivalent of the following C++ (where that 's' is pointer to an instance of 'S'):

  s->b.assign( "Hello, world!" ); // assign to S::b

If there were an S member function:

  void S::assign_to_b( char const* );

it would be easy to write a "thunk" wrapper to call it. However, assume that there is no such S member function. I therefore need a way to get the offset of 'b' and add it to 's' so that I can call T_string_M_assign_Pv() on it.

Given this helper function:

  template<class ClassType,class MbrType> inline
  ptrdiff_t mbr_offset_of( MbrType ClassType::*p ) {
    ClassType const *const c = static_cast<ClassType*>( nullptr );
    return reinterpret_cast<ptrdiff_t>( &(c->*p) );
  }

I could take a Pointer to an S, use ptrtoint, add the offset, use inttoptr, and use that pointer to pass as the 'this' argument to T_string_M_assign_Pv(). The LLVM code generated via the IRBuilder is:

  %0 = call i8* @T_S_M_new(i8* %heap)
  %1 = ptrtoint i8* %0 to i64
  %2 = add i64 %1, 8 ; 8 is what's returned by mbr_offset_of()
  %3 = inttoptr i64 %2 to i8*
  call void @T_string_M_assign_A_Pv(i8* %3, i8* getelementptr inbounds ([15 x i8]* @0, i64 0, i64 0))

The code does in fact work. My questions are:

* Is this an "OK" thing to do?
* Is there a better way?

- Paul

P.S.: I don't explicitly put the getelementptr instruction in there. That's something the IRBuilder does all by itself.

Given the C++ struct:

        struct S {
          string a;
          string b;
        };

I also have C "thunk" functions that I call from LLVM code:

        // calls S::S()
        void* T_S_M_new( void *heap );

        // call string::assign(char const*)
        void T_string_M_assign_Pv( void *that, void *value );

I want to do the LLVM equivalent of the following C++ (where that 's' is pointer to an instance of 'S'):

        s->b.assign( "Hello, world!" ); // assign to S::b

If there were an S member function:

        void S::assign_to_b( char const* );

it would be easy to write a "thunk" wrapper to call it. However, assume that there is no such S member function. I therefore need a way to get the offset of 'b' and add it to 's' so that I can call T_string_M_assign_Pv() on it.

Given this helper function:

        template<class ClassType,class MbrType> inline
        ptrdiff_t mbr_offset_of( MbrType ClassType::*p ) {
          ClassType const *const c = static_cast<ClassType*>( nullptr );
          return reinterpret_cast<ptrdiff_t>( &(c->*p) );
        }

I could take a Pointer to an S, use ptrtoint, add the offset, use inttoptr, and use that pointer to pass as the 'this' argument to T_string_M_assign_Pv(). The LLVM code generated via the IRBuilder is:

  %0 = call i8* @T_S_M_new(i8* %heap)
  %1 = ptrtoint i8* %0 to i64
  %2 = add i64 %1, 8 ; 8 is what's returned by mbr_offset_of()
  %3 = inttoptr i64 %2 to i8*
  call void @T_string_M_assign_A_Pv(i8* %3, i8* getelementptr inbounds ([15 x i8]* @0, i64 0, i64 0))

The code does in fact work. My questions are:

* Is this an "OK" thing to do?

Doing math on pointers if you know the offsets is perfectly
legitimate. clang will generate code like this for certain casts
which can't be represented in the type system.

Using GEP on an i8* is a bit nicer to the optimizer, though, because
using ptrtoint/inttoptr has effects on alias analysis.

* Is there a better way?

I'm not entirely sure how you're using mbr_offset_of, but it's broken
if there are any classes with virtual bases involved. Getting this
case right probably involves using clang somehow (either to synthesize
the relevant thunks, or query for the right offsets and generate the
code yourself).

-Eli

Using GEP on an i8* is a bit nicer to the optimizer, though, because
using ptrtoint/inttoptr has effects on alias analysis.

My understanding is that, in order to use GEP, you have to provide the LLVM code with the struct layout, i.e., build a StructType object. In my case, that struct is declared in C++ code already and, in order to use GEP, I'd have to replicate the struct layout (exactly as the C++ compiler would) in LLVM code -- something that I'd rather not do, not to mention that it's fairly "brittle" even if I could manage to get it right. (Simple structs would probably be easy, but struct that have virtual functions or multiple base classes would be much harder.)

I'm not entirely sure how you're using mbr_offset_of

Given 't', an instance of some class T, and some member T::m, find the integer offset in bytes from &t to &t.m. This offset, when added to &t, should be &t.m.

I'm using mbr_offset_of to get the C++ compiler to do the work of telling me what the correct offset is for the already existing struct.

but it's broken if there are any classes with virtual bases involved.

Really? This simple code works just fine:

        struct A { int ai; };
        struct X : virtual A { int xi; };
        struct Y : virtual A { int yi; };

        struct S : X, Y {
          string a;
          string b;
        };

        template<class ClassType,class MbrType> inline
        ptrdiff_t mbr_offset_of( MbrType ClassType::*p ) {
          ClassType const *const c = static_cast<ClassType*>( nullptr );
          return reinterpret_cast<ptrdiff_t>( &(c->*p) );
        }

        int main() {
          ptrdiff_t offset = mbr_offset_of( &S::b );
          S s;
          string *p = (string*)((char*)&s + offset);
          p->assign( "Hello, world!" );
          cout << *p << endl;
         return 0;
        }

Despite that, however, the equivalent code in LLVM (once I introduce a base class for S, even just ordinary inheritance), crashes. I don't understand why, however. I print out the offset, and it's the correct value that's getting added to the Pointer.

- Paul

Using GEP on an i8* is a bit nicer to the optimizer, though, because
using ptrtoint/inttoptr has effects on alias analysis.

My understanding is that, in order to use GEP, you have to provide the LLVM code with the struct layout, i.e., build a StructType object. In my case, that struct is declared in C++ code already and, in order to use GEP, I'd have to replicate the struct layout (exactly as the C++ compiler would) in LLVM code -- something that I'd rather not do, not to mention that it's fairly "brittle" even if I could manage to get it right. (Simple structs would probably be easy, but struct that have virtual functions or multiple base classes would be much harder.)

No, you don't have to... you can just use GEP on i8*'s. The LLVM type
system doesn't have any semantic significance.

I'm not entirely sure how you're using mbr_offset_of

Given 't', an instance of some class T, and some member T::m, find the integer offset in bytes from &t to &t.m. This offset, when added to &t, should be &t.m.

I'm using mbr_offset_of to get the C++ compiler to do the work of telling me what the correct offset is for the already existing struct.

If you can do that, why not just generate a thunk to perform the addressing?

but it's broken if there are any classes with virtual bases involved.

Really? This simple code works just fine:

        struct A { int ai; };
        struct X : virtual A { int xi; };
        struct Y : virtual A { int yi; };

        struct S : X, Y {
          string a;
          string b;
        };

        template<class ClassType,class MbrType> inline
        ptrdiff_t mbr_offset_of( MbrType ClassType::*p ) {
          ClassType const *const c = static_cast<ClassType*>( nullptr );
          return reinterpret_cast<ptrdiff_t>( &(c->*p) );
        }

        int main() {
          ptrdiff_t offset = mbr_offset_of( &S::b );
          S s;
          string *p = (string*)((char*)&s + offset);
          p->assign( "Hello, world!" );
          cout << *p << endl;
         return 0;
        }

It starts to become an issue when you try to compute the offset to
e.g. A::ai in your example.

Despite that, however, the equivalent code in LLVM (once I introduce a base class for S, even just ordinary inheritance), crashes. I don't understand why, however. I print out the offset, and it's the correct value that's getting added to the Pointer.

No idea what's happening here.

-Eli

My understanding is that, in order to use GEP, you have to provide the LLVM code with the struct layout, i.e., build a StructType object. In my case, that struct is declared in C++ code already and, in order to use GEP, I'd have to replicate the struct layout (exactly as the C++ compiler would) in LLVM code -- something that I'd rather not do, not to mention that it's fairly "brittle" even if I could manage to get it right. (Simple structs would probably be easy, but struct that have virtual functions or multiple base classes would be much harder.)

No, you don't have to... you can just use GEP on i8*'s. The LLVM type
system doesn't have any semantic significance.

Oh! I just tried it and it works. :slight_smile:

I'm not entirely sure how you're using mbr_offset_of

Given 't', an instance of some class T, and some member T::m, find the integer offset in bytes from &t to &t.m. This offset, when added to &t, should be &t.m.

I'm using mbr_offset_of to get the C++ compiler to do the work of telling me what the correct offset is for the already existing struct.

If you can do that, why not just generate a thunk to perform the addressing?

Because if I can create a thunk to do that, I can just as easily create a thunk to provide a "setter" for the struct member (something I'd prefer not to do).

I'm trying to compute the offset "inline" in the LLVM code rather than (a) have to create yet another C thunk and (b) call it.

but it's broken if there are any classes with virtual bases involved.

Really? This simple code works just fine:

       struct A { int ai; };
       struct X : virtual A { int xi; };
       struct Y : virtual A { int yi; };

       struct S : X, Y {
         string a;
         string b;
       };

       template<class ClassType,class MbrType> inline
       ptrdiff_t mbr_offset_of( MbrType ClassType::*p ) {
         ClassType const *const c = static_cast<ClassType*>( nullptr );
         return reinterpret_cast<ptrdiff_t>( &(c->*p) );
       }

       int main() {
         ptrdiff_t offset = mbr_offset_of( &S::b );
         S s;
         string *p = (string*)((char*)&s + offset);
         p->assign( "Hello, world!" );
         cout << *p << endl;
        return 0;
       }

It starts to become an issue when you try to compute the offset to
e.g. A::ai in your example.

Hmmmm.... I just changed:

  s/int ai/string as/
  s/&S::b/&S::as/

and the code still works. The offset is 0 which is what you'd expect with class 'A' being a public virtual (shared) base class.

Despite that, however, the equivalent code in LLVM (once I introduce a base class for S, even just ordinary inheritance), crashes. I don't understand why, however. I print out the offset, and it's the correct value that's getting added to the Pointer.

No idea what's happening here.

The IR code is now:

  @0 = private unnamed_addr constant [14 x i8] c"Hello, world!\00"
  ...
  %0 = call i8* @T_S_M_new(i8* %heap)
  %1 = getelementptr i8* %0, i64 16
  call void @T_string_M_assign_A_Pv(i8* %1, i8* getelementptr inbounds ([14 x i8]* @0, i64 0, i64 0))

where the "16" is the correct offset (it agrees with my pure C++ version of the code), yet it still crashes. It's not obvious why.

- Paul

The offset of as is 0 given an A*, but not given an S*. If you handle
that already, it's okay, I guess.

-Eli