RFC: Flesh out the IRGen C++ ABI class

Hi,

Well, it's been awhile, but I finally got around to finishing up
libCodeGen's C++ ABI class. I've added a few methods in this patch to:

- Emit important C++ data structures (RTTI descriptors, virtual tables)
- Retrieve the CGRecordLayoutBuilder, which can be overridden by the ABI.
- Emit a call to a virtual method.
- Emit loads and stores to fields in C++ classes/structs--particularly
important for fields in virtual bases.

I haven't put in any EH-related stuff--I leave that in John's capable
hands, since he knows way more about it than I do. (Also, he told me
he'd do it, so that helps :).

I'd appreciate comments on it--in particular, if I'm missing anything,
or if you feel the methods I've added are inappropriate. I don't think
this is worthy to be committed yet, so I've sent it here instead of
cfe-commits.

BTW, I've left out the inevitable stubs I'll have to add to make the
resulting clang binary link properly.

Chip

flesh-cg-cxx-abi.patch (2.85 KB)

Well, it's been awhile, but I finally got around to finishing up
libCodeGen's C++ ABI class. I've added a few methods in this patch to:

I wouldn't say "finished". :slight_smile:

It's great to hear that you're planning on working on implementing these
features for different ABIs! That said, I don't really see much value in
fleshing out the API so far in advance of any implementations; it's not
like the eventual API is likely to match the current idea. Please introduce
these methods only as you generalize the features behind them.

Also, if you'd like to implement the MS ABI methods that we've already
extracted out, that would be awesome.

I'll go ahead and comment on the interfaces you've proposed; feel free
to ignore these comments until you get around to those features.

- Emit important C++ data structures (RTTI descriptors, virtual tables)

When are these methods called and what do they return? There are two
important use cases for these structures — emitting a reference and emitting
the actual tables. Also, v-tables can involve multiple GlobalValues even in the
Itanium ABI, and I don't know whether the MSVC ABI emits vbtables in the
same global as vftables, or whether multiple vftables are emitted in the same
global as each other.

- Retrieve the CGRecordLayoutBuilder, which can be overridden by the ABI.

How are you expecting this to work? CGRecordLayoutBuilder is not currently
a singleton, and I'm pretty sure multiple layouts can be in-flight at once.

- Emit a call to a virtual method.

Hmm. I'd prefer a method more like EmitLoadOfMemberFunctionPointer,
which returns the function to call and adjusts the 'this' pointer but doesn't
actually call the function.

We'll also need some hook to override the default calling convention for
instance methods.

- Emit loads and stores to fields in C++ classes/structs--particularly
important for fields in virtual bases.

Is this actually necessary? The sub-expression on a MemberExpr should
always be an l-value of the type which declared the field. You should only
need a hook to do a virtual derived-to-base conversion.

I haven't put in any EH-related stuff--I leave that in John's capable
hands, since he knows way more about it than I do. (Also, he told me
he'd do it, so that helps :).

Yeah, there's a lot of backend work still to be done here.

John.

Well, it's been awhile, but I finally got around to finishing up
libCodeGen's C++ ABI class. I've added a few methods in this patch to:

I wouldn't say "finished". :slight_smile:

'Course not. I have a long way to go. I'm just saying that I'm going to
finish this up now.

It's great to hear that you're planning on working on implementing these
features for different ABIs! That said, I don't really see much value in
fleshing out the API so far in advance of any implementations; it's not
like the eventual API is likely to match the current idea. Please introduce
these methods only as you generalize the features behind them.

OK. I just wanted to know your opinion on what I had in mind.

Also, if you'd like to implement the MS ABI methods that we've already
extracted out, that would be awesome.

All right then.

I'll go ahead and comment on the interfaces you've proposed; feel free
to ignore these comments until you get around to those features.

- Emit important C++ data structures (RTTI descriptors, virtual tables)

When are these methods called and what do they return?

LayoutVTables() and EmitRTTIDescriptors(). They both return an
llvm::Value*. In retrospect, returning only one value seems limited. The
methods' names imply that there could be more than one.

There are two
important use cases for these structures — emitting a reference and emitting
the actual tables.

Right--which means I'm missing some methods.

Also, v-tables can involve multiple GlobalValues even in the
Itanium ABI, and I don't know whether the MSVC ABI emits vbtables in the
same global as vftables,

Probably not. There are two different manglings for them.

or whether multiple vftables are emitted in the same
global as each other.

Nope, unless the class was declared __declspec(selectany), in which case
the vftable gets emitted as a SELECT_ANY COMDAT, and the linker will
then coalesce identical vftables into one. But the coalescing happens
after the compiler has finished running, so that's no help.

- Retrieve the CGRecordLayoutBuilder, which can be overridden by the ABI.

How are you expecting this to work? CGRecordLayoutBuilder is not currently
a singleton, and I'm pretty sure multiple layouts can be in-flight at once.

Good point. I should probably have it return *a* CGRecordLayoutBuilder
instead. I could have it construct an instance of the object, and the
caller could supply any necessary parameters.

- Emit a call to a virtual method.

Hmm. I'd prefer a method more like EmitLoadOfMemberFunctionPointer,
which returns the function to call and adjusts the 'this' pointer but doesn't
actually call the function.

I considered that, but then I started thinking more generically. I can
easily envision an Objective-C-like C++ ABI that uses message sends
instead of vtables to implement virtual calls. But, since I already have
the LayOutVTables() method, that seems like a moot point. I'll change it
back for now.

We'll also need some hook to override the default calling convention for
instance methods.

True. The Sema CXXABI object also needs one, by the way.

- Emit loads and stores to fields in C++ classes/structs--particularly
important for fields in virtual bases.

Is this actually necessary? The sub-expression on a MemberExpr should
always be an l-value of the type which declared the field. You should only
need a hook to do a virtual derived-to-base conversion.

Yeah, you can tell I'm still pretty new to this. But your idea makes sense.

I was thinking of a C++ ABI that stored members--all members--in exotic
ways, like say, putting them in a separate struct and having a pointer
to that struct in the object proper, but I guess we could model that in
the AST.

I haven't put in any EH-related stuff--I leave that in John's capable
hands, since he knows way more about it than I do. (Also, he told me
he'd do it, so that helps :).

Yeah, there's a lot of backend work still to be done here.

Thanks for the feedback!

Chip

And I just realized. The getCanonicalCallConv() function turns CC_C into
CC_Default. The problem is that for methods on x86 in the Microsoft ABI,
CC_Default is the same as CC_X86ThisCall. I could just add a parameter
to getCanonicalCallConv() to have it call the CXXABI hook, but then I
need to know when to pass 'true' to get the right behavior.

The two places where getCanonicalCallConv() are called where this
matters are ASTContext::getFunctionType() in lib/AST/ASTContext.cpp and
ProcessFnAttr() in lib/Sema/SemaType.cpp. This is important so we get
the right semantic analysis. Any ideas?

Chip

My inclination is to say that "canonicalizing" the function type is the wrong thing to do, and that the default CC for a function in any particular context should be determined by that context. That means that a default-CC function type is different from a specifically-cdecl function type, but I can live with that.

John.

s/function type/calling convention/

John.

So you're saying we should get rid of getCanonicalCallConv()? Then I can
dispense with the AST hook, too.

I don't know about this. The whole point of canonicalizing the CC was
that cdecl function types were compatible with default-CC function
types. This matters when, e.g., trying to merge two compatible
declarations, or assigning function pointers. Are you sure you can live
with it now?

Chip

There's already explicit function-merging logic that deals with CCs.

The problem is that a function type with no explicit CC really is different in important ways from a function type with an explicit CC. In particular, if I have
  typedef void callback_t(A*);
then this:
  callback_t B::*cb = ...; (b->*cb)(a);
uses the default calling convention for instance methods, whereas this:
  callback_t *f; f(a);
uses the default calling convention for normal functions.

Similar, if callback_t is __attribute__((cdecl)), then instance methods declared with that typedef (which is legal!) should be called with cdecl.

The point is that function types exist independent of any particular declaring context, and the right rule is to fill in a sensible calling convention based on context when none was mandated. If that means that we have to produce different template instantiations for __attribute__((cdecl)) functions than unadorned ones, I can live with that.

But I'm open to argument.

John.

I think we have to live with the fact that we produce different template instantiations for __attribute__((cdecl)) functions than unadorned ones. I haven't looked into what GCC does, but different name-mangling for cdecl, fastcall, etc. function types makes perfect sense.

  - Doug

My inclination is to say that "canonicalizing" the function type is the wrong thing to do, and that the default CC for a function in any particular context should be determined by that context. That means that a default-CC function type is different from a specifically-cdecl function type, but I can live with that.

So you're saying we should get rid of getCanonicalCallConv()? Then I can
dispense with the AST hook, too.

I don't know about this. The whole point of canonicalizing the CC was
that cdecl function types were compatible with default-CC function
types. This matters when, e.g., trying to merge two compatible
declarations, or assigning function pointers. Are you sure you can live
with it now?

There's already explicit function-merging logic that deals with CCs.

Yeah, and it uses getCanonicalCallConv().

The problem is that a function type with no explicit CC really is different in important ways from a function type with an explicit CC. In particular, if I have
typedef void callback_t(A*);
then this:
callback_t B::*cb = ...; (b->*cb)(a);
uses the default calling convention for instance methods, whereas this:
callback_t *f; f(a);
uses the default calling convention for normal functions.

That's why, when getCanonicalCallConv() is called, it needs to know if
it's dealing with a method instead of a normal function.

Similar, if callback_t is __attribute__((cdecl)), then instance methods declared with that typedef (which is legal!) should be called with cdecl.

Hmm... This complicates things.

The point is that function types exist independent of any particular declaring context, and the right rule is to fill in a sensible calling convention based on context when none was mandated. If that means that we have to produce different template instantiations for __attribute__((cdecl)) functions than unadorned ones, I can live with that.

I think we have to live with the fact that we produce different template instantiations for __attribute__((cdecl)) functions than unadorned ones. I haven't looked into what GCC does, but different name-mangling for cdecl, fastcall, etc. function types makes perfect sense.

Last I checked, the Itanium ABI didn't mangle calling conventions at all ;).

I tried just now to gut getCanonicalCallConv(), and a bunch of tests
started failing horribly. (By that, I mean *crash* horrible.) I don't
think it's worth removing this function. There must be another way.

Chip

My inclination is to say that "canonicalizing" the function type is the wrong thing to do, and that the default CC for a function in any particular context should be determined by that context. That means that a default-CC function type is different from a specifically-cdecl function type, but I can live with that.

So you're saying we should get rid of getCanonicalCallConv()? Then I can
dispense with the AST hook, too.

I don't know about this. The whole point of canonicalizing the CC was
that cdecl function types were compatible with default-CC function
types. This matters when, e.g., trying to merge two compatible
declarations, or assigning function pointers. Are you sure you can live
with it now?

There's already explicit function-merging logic that deals with CCs.

Yeah, and it uses getCanonicalCallConv().

There will obviously still be an idea of "the default calling convention
means X in this specialized context"; that's not in question.

The problem is that a function type with no explicit CC really is different in important ways from a function type with an explicit CC. In particular, if I have
typedef void callback_t(A*);
then this:
callback_t B::*cb = ...; (b->*cb)(a);
uses the default calling convention for instance methods, whereas this:
callback_t *f; f(a);
uses the default calling convention for normal functions.

That's why, when getCanonicalCallConv() is called, it needs to know if
it's dealing with a method instead of a normal function.

What we're saying is that that's obviously not possible in general. The type
system needs to be able to deal with function types whose calling convention
will be determined only in context, i.e. when we're not dealing with a method
*or* a normal function. That means that treating calling conventions as if they
can canonicalized on the type cannot be the right approach.

The point is that function types exist independent of any particular declaring context, and the right rule is to fill in a sensible calling convention based on context when none was mandated. If that means that we have to produce different template instantiations for __attribute__((cdecl)) functions than unadorned ones, I can live with that.

I think we have to live with the fact that we produce different template instantiations for __attribute__((cdecl)) functions than unadorned ones. I haven't looked into what GCC does, but different name-mangling for cdecl, fastcall, etc. function types makes perfect sense.

Last I checked, the Itanium ABI didn't mangle calling conventions at all ;).

They're non-standard, which means we can completely ignore them, right? :slight_smile:

I tried just now to gut getCanonicalCallConv(), and a bunch of tests
started failing horribly. (By that, I mean *crash* horrible.) I don't
think it's worth removing this function. There must be another way.

If you've never revised the type system before, I would not start with this task.

John.

There's already explicit function-merging logic that deals with CCs.

Yeah, and it uses getCanonicalCallConv().

There will obviously still be an idea of "the default calling convention
means X in this specialized context"; that's not in question.

That's true.

The problem is that a function type with no explicit CC really is different in important ways from a function type with an explicit CC. In particular, if I have
typedef void callback_t(A*);
then this:
callback_t B::*cb = ...; (b->*cb)(a);
uses the default calling convention for instance methods, whereas this:
callback_t *f; f(a);
uses the default calling convention for normal functions.

That's why, when getCanonicalCallConv() is called, it needs to know if
it's dealing with a method instead of a normal function.

What we're saying is that that's obviously not possible in general. The type
system needs to be able to deal with function types whose calling convention
will be determined only in context, i.e. when we're not dealing with a method
*or* a normal function. That means that treating calling conventions as if they
can canonicalized on the type cannot be the right approach.

Good point.

The point is that function types exist independent of any particular declaring context, and the right rule is to fill in a sensible calling convention based on context when none was mandated. If that means that we have to produce different template instantiations for __attribute__((cdecl)) functions than unadorned ones, I can live with that.

I think we have to live with the fact that we produce different template instantiations for __attribute__((cdecl)) functions than unadorned ones. I haven't looked into what GCC does, but different name-mangling for cdecl, fastcall, etc. function types makes perfect sense.

Last I checked, the Itanium ABI didn't mangle calling conventions at all ;).

They're non-standard, which means we can completely ignore them, right? :slight_smile:

Right.

Now the Microsoft ABI is a different story. Calling conventions are an
integral part of the mangling scheme.

I tried just now to gut getCanonicalCallConv(), and a bunch of tests
started failing horribly. (By that, I mean *crash* horrible.) I don't
think it's worth removing this function. There must be another way.

If you've never revised the type system before, I would not start with this task.

I haven't. (svn log in include/AST shows only two commits by me, the one
adding the CXXABI support for member pointers, and the one adding the
force_align_arg_pointer attribute.) Thus I leave this to more
experienced people.

Chip