[PATCH]: Preparing AST for C++ declarations

Hi,

The attached patch contains these changes:

-New Decl subclasses:
    CXXField (derived from Field) - instance fields
    CXXRecord (derived from Record) - Records for C++
    CXXMethod (derived from Function) - static and instance methods
    CXXClassVar (derived from Var) - static data members

-Because there are two "types" of tags (Struct/CXXStruct etc.), to avoid constantly checking for both (i.e. when you need to know whether the Record is a 'struct' or 'union'),
I've added a TagKind enum in TagDecl and a TagDecl::getTagKind method.
Plus, RecordDecl/CXXRecordDecl::Create methods receive a TagDecl::TagKind enum instead of the general Decl::Kind enum.

-DeclContext gets a "ScopedDecl *DeclChain" member, which is used to chain decls that belong to the same DeclContext.

-Argiris

ast-cxxdecl-2.patch (67.3 KB)

except that it's too big, which makes reviewing it a lot more
difficult. Please put the DeclBase code movement into a separate
patch (moving the DeclBase code is fine without review), put the
DeclChain changes into a separate patch, put the Decl::Struct ->
TagDecl::TK_struct changes into a separate patch, and split out
anything else that's obviously independent.

Minor issue I spotted: in CXXMethodDecl::getThisType, there's no point
to making "this" const; it isn't an lvalue, so the const modifier
doesn't do anything.

-Eli

Hi Eli,
thanks for reviewing.

Eli Friedman wrote:

>From my reading, there's nothing obviously wrong with this patch,
except that it's too big, which makes reviewing it a lot more
difficult. Please put the DeclBase code movement into a separate
patch (moving the DeclBase code is fine without review), put the
DeclChain changes into a separate patch, put the Decl::Struct ->
TagDecl::TK_struct changes into a separate patch, and split out
anything else that's obviously independent.
  
I've added DeclBase.cpp here (Chris was fine with this change at the previous related patch):
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20080602/006009.html

In the currently attached patch there are only the AST-related changes (introducing the C++ decl subclasses); these are the important ones.

Minor issue I spotted: in CXXMethodDecl::getThisType, there's no point
to making "this" const; it isn't an lvalue, so the const modifier
doesn't do anything.
  
Making 'this' const is so that the error diagnostics can report its type as "[class name]* const". This convention is followed by both gcc and msvc.

-Argiris

ast-cxx.patch (23.7 KB)

Hi Eli,
thanks for reviewing.

Eli Friedman wrote:

>From my reading, there's nothing obviously wrong with this patch,
except that it's too big, which makes reviewing it a lot more
difficult. Please put the DeclBase code movement into a separate
patch (moving the DeclBase code is fine without review), put the
DeclChain changes into a separate patch, put the Decl::Struct ->
TagDecl::TK_struct changes into a separate patch, and split out
anything else that's obviously independent.

I've added DeclBase.cpp here (Chris was fine with this change at the
previous related patch):
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20080602/006009.html

In the currently attached patch there are only the AST-related changes
(introducing the C++ decl subclasses); these are the important ones.

Okay, thanks; this is a lot easier to look at.

Minor issue I spotted: in CXXMethodDecl::getThisType, there's no point
to making "this" const; it isn't an lvalue, so the const modifier
doesn't do anything.

Making 'this' const is so that the error diagnostics can report its type as
"[class name]* const". This convention is followed by both gcc and msvc.

All right, I guess.

Most important issue: you're adding an Access member to TypeDecl, a
class which isn't C++-specific. This will increase the size of all
TypeDecls for C code by 4 bytes. Adding new members to data
structures requires additional scrutiny, because it affects memory
usage. In this case, it doesn't appear to be worthwhile to bloat the
memory usage for C code. There are a couple of possible ways to fix
this: the simplest is to add a CXXEnumDecl and CXXTypedefDecl, and
shuffle the Access member down to the C++-specific classes. This
isn't exactly pretty, but it's straightforward to implement.

Have you considered what will happen to RecordDecls once inheritance
is implemented? I'm not sure that a RecordDecl will continue to
properly make sense, at least in terms of the internal data structure.
The Nth member of a RecordDecl might not really have any
correspondence to anything useful, although I guess most code
iterating over members needs to be aware of C++ anyway. Member name
lookup will also need to be completely different. I guess it isn't
really an issue at this point, but something to think about.

A CXXInstanceMethodType and CXXStaticMethodType are going to be
necessary at some point; also not something that needs to be done
here, but something to consider.

I notice you didn't include the serialization implementation; it's
okay to leave it out for now, but it will need to be done.

Overall, the patch seems to be well thought out. I'm not completely
sure everything is going to remain the same once some of this stuff
actually has parser/sema support, but it's hard to tell without
implementing it.

-Eli

Eli Friedman wrote:

Most important issue: you're adding an Access member to TypeDecl, a
class which isn't C++-specific. This will increase the size of all
TypeDecls for C code by 4 bytes. Adding new members to data
structures requires additional scrutiny, because it affects memory
usage. In this case, it doesn't appear to be worthwhile to bloat the
memory usage for C code. There are a couple of possible ways to fix
this: the simplest is to add a CXXEnumDecl and CXXTypedefDecl, and
shuffle the Access member down to the C++-specific classes. This
isn't exactly pretty, but it's straightforward to implement.
  
I've suggested this in a previous patch, but it seemed ugly wasn't well received:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-May/001780.html

This will increase the size of all TypeDecls for C code by 4 bytes

Two suggestions about this:
1) "Swizzle" the access specifier bits into the "TypeForDecl" member pointer.
2) Or move the access specifier bits to NamedDecl, thus they get packed with the bitfields from Decl

I'm more fond of (2), since it also simplifies things by letting the access specifier be shared by all C++ decls.

Have you considered what will happen to RecordDecls once inheritance
is implemented? I'm not sure that a RecordDecl will continue to
properly make sense, at least in terms of the internal data structure.
The Nth member of a RecordDecl might not really have any
correspondence to anything useful, although I guess most code
iterating over members needs to be aware of C++ anyway. Member name
lookup will also need to be completely different. I guess it isn't
really an issue at this point, but something to think about.
  
This is from the previous patch I posted:
The instance fields of CXXRecord are stored in the members array of RecordDecl, thus the data layout of CXXRecord is calculated through the Record.
All the other members (including the static fields), are ScopedDecls with the CXXRecord as declaration context, so they can be iterated through a general DeclContext member iterator (not implemented yet).
Name lookup for class members will be efficient through the use of the IdentifierResolver.

The Nth member (as "getMember(N)") will be the Nth instance field of a specific Record. It seems to me that it makes sense even when inheritance comes into the picture, do you disagree ?

A CXXInstanceMethodType and CXXStaticMethodType are going to be
necessary at some point; also not something that needs to be done
here, but something to consider.
  

Ok.

I notice you didn't include the serialization implementation; it's
okay to leave it out for now, but it will need to be done.
  

Yes, of course.

Overall, the patch seems to be well thought out. I'm not completely
sure everything is going to remain the same once some of this stuff
actually has parser/sema support, but it's hard to tell without
implementing it.
  
Well, I've come to the point where simple stuff like this gets passed through -fsyntax-only:

class S {
public:
  int f1(int p) {
    A z = 6;
    return p + x + this->y + z;
  }

  typedef int A;

private:
  int x,y;
};

-Argiris

Eli Friedman wrote:

Most important issue: you're adding an Access member to TypeDecl, a
class which isn't C++-specific. This will increase the size of all
TypeDecls for C code by 4 bytes. Adding new members to data
structures requires additional scrutiny, because it affects memory
usage. In this case, it doesn't appear to be worthwhile to bloat the
memory usage for C code. There are a couple of possible ways to fix
this: the simplest is to add a CXXEnumDecl and CXXTypedefDecl, and
shuffle the Access member down to the C++-specific classes. This
isn't exactly pretty, but it's straightforward to implement.

I've suggested this in a previous patch, but it seemed ugly wasn't well
received:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-May/001780.html

Hmm, maybe... I think that the original comment was more about
disliking multiple inheritance, though.

This will increase the size of all TypeDecls for C code by 4 bytes

Two suggestions about this:
1) "Swizzle" the access specifier bits into the "TypeForDecl" member
pointer.
2) Or move the access specifier bits to NamedDecl, thus they get packed with
the bitfields from Decl

I'm more fond of (2), since it also simplifies things by letting the access
specifier be shared by all C++ decls.

That's a possibility; you also have to consider how this interacts
with serialization, though.

The instance fields of CXXRecord are stored in the members array of
RecordDecl, thus the data layout of CXXRecord is calculated through the
Record.
All the other members (including the static fields), are ScopedDecls with
the CXXRecord as declaration context, so they can be iterated through a
general DeclContext member iterator (not implemented yet).
Name lookup for class members will be efficient through the use of the
IdentifierResolver.

The Nth member (as "getMember(N)") will be the Nth instance field of a
specific Record. It seems to me that it makes sense even when inheritance
comes into the picture, do you disagree ?

The issue is the following: class B inherits from class A. class A
has a member x. Is x accessible through getMember(N)? If so, how do
you figure out the ownership of a FieldDecl once it's in the
RecordDecl?

Well, I've come to the point where simple stuff like this gets passed
through -fsyntax-only:

class S {
public:
int f1(int p) {
  A z = 6;
  return p + x + this->y + z;
}

typedef int A;

private:
int x,y;
};

Oh, cool; that's a lot of progress.

-Eli

Eli Friedman wrote:

[...]

The instance fields of CXXRecord are stored in the members array of
RecordDecl, thus the data layout of CXXRecord is calculated through the
Record.
All the other members (including the static fields), are ScopedDecls with
the CXXRecord as declaration context, so they can be iterated through a
general DeclContext member iterator (not implemented yet).
Name lookup for class members will be efficient through the use of the
IdentifierResolver.
      

The Nth member (as "getMember(N)") will be the Nth instance field of a
specific Record. It seems to me that it makes sense even when inheritance
comes into the picture, do you disagree ?
    
The issue is the following: class B inherits from class A. class A
has a member x. Is x accessible through getMember(N)? If so, how do
you figure out the ownership of a FieldDecl once it's in the
RecordDecl?
  
The way I see it, getMember() will only return the instance fields of class B. When in C++, name lookup for members will be done through IdentifierResolver, not RecordDecl::getMember(IdentifierInfo *)

-Argiris

That works, I guess. And I suppose CodeGen and other similar users
will have to look up the inheritance tree to come up with the right
layout anyway.

-Eli

Eli Friedman wrote:

Most important issue: you're adding an Access member to TypeDecl, a
class which isn't C++-specific. This will increase the size of all
TypeDecls for C code by 4 bytes.

In the attached patch the access specifier is packed with the bitfields of Decl. This reduces the size of C++ decls as well (i.e CXXField).

-Argiris

ast-cxx-2.patch (22.3 KB)