ParseTag only called from ParseEnumSpecifier

I noticed today that ParseTag is only called by ParseEnumSpecifier. In the past ParseTag was used do much of the heavy lifting of parsing structs, unions, and enums, and the comments on the source still say this. Now it looks like ParseTag is only called by ParseEnumSpecifier, which means most of the comments are not true and that the logic could probably just be rolled into ParseEnumSpecifier directly.

ParseClassSpecifier (in ParseDeclCXX.cpp) now does all the parsing for structs, unions, and C++ classes. This seems a little weird to me, and it isn't clear from the comments in the Parse library that this is the case. Specifically, I have two concerns:

(1) ParseTag now appears to be somewhat redundant and unnecessary, and in some ways ill-named. It doesn't appear to serve its original purpose anymore, and thus it's logic should probably just be folded into ParseEnumSpecifier.

(2) It seems strange to me that parsing structs and unions, core pieces of the C language, is done in ParseClassSpecifier (ParseDeclCXX.cpp). There seems to be a lot of commonality here that could possibly be refactored and moved back into ParseDecl.cpp. This is my first look at this code in a while, so I'm don't claim to understand all of the design decisions here, but conceptually it just seems a little confusing.

As a side note, my original interest in this code was to allow the parser+actions to be able to distinguish up front between enum/struct/union/class forward declarations and definitions. This would allow us to hopefully clean up some elements of the ASTs with regards to forward declarations of tag types, and it seems it would also relay more information to the parser Actions.

I agree this should be cleaned up. Comments below...

I noticed today that ParseTag is only called by ParseEnumSpecifier.
In the past ParseTag was used do much of the heavy lifting of parsing
structs, unions, and enums, and the comments on the source still say
this. Now it looks like ParseTag is only called by
ParseEnumSpecifier, which means most of the comments are not true and
that the logic could probably just be rolled into ParseEnumSpecifier
directly.

From my understanding of the code, it appears that
ParseClassSpecifier (in ParseDeclCXX.cpp) now does all the parsing for
structs, unions, and C++ classes. This seems a little weird to me,
and it isn't clear from the comments in the Parse library that this is
the case. Specifically, I have two concerns:

(1) ParseTag now appears to be somewhat redundant and unnecessary, and
in some ways ill-named. It doesn't appear to serve its original
purpose anymore, and thus it's logic should probably just be folded
into ParseEnumSpecifier.

Another solution is to have ParseClassSpecifier() call ParseTag(). At the moment, both of these routines call "ActOnTag" (which is a little odd). If ParseClassSpecifier() called ParseTag(), the action could be Decl, Ref) in one place. From my perspective, centralizing both of these would be simplifying.

(2) It seems strange to me that parsing structs and unions, core
pieces of the C language, is done in ParseClassSpecifier
(ParseDeclCXX.cpp). There seems to be a lot of commonality here that
could possibly be refactored and moved back into ParseDecl.cpp. This
is my first look at this code in a while, so I'm don't claim to
understand all of the design decisions here, but conceptually it just
seems a little confusing.

Right...if ParseClassSpecifier() called ParseTag() it would make more sense. I haven't looked that closely, however I imagine it could be refactored to do this.

Based on my previous comment, I guess I'd prefer we make ParseTag() more reusable (rather than get rid of it).

Thoughts?

snaroff

These comments make a lot of sense to me.

Another possibility is that ParseTag calls ParseClassSpecifier. That way the C++ specific parsing is delegated to by ParseTag, instead of having the commonality delegated to ParseTag by ParseClassSpecifier since the notion of a "class", at least as a "tag decl", is C++ specific.

While we're doing these changes, it would also be nice if ActOnTag would have the information to distinguish between a forward declaration and a definition. The current design *forces* the implementation of Actions to first construct a Decl object and then fill in its implementation (e.g., set the fields of a struct). While in some implementations of the parser Actions this might be the best thing to do, forcing clients of Parse to do it this way just doesn't feel right to me.

I noticed today that ParseTag is only called by ParseEnumSpecifier.
In the past ParseTag was used do much of the heavy lifting of parsing
structs, unions, and enums, and the comments on the source still say
this. Now it looks like ParseTag is only called by
ParseEnumSpecifier, which means most of the comments are not true and
that the logic could probably just be rolled into ParseEnumSpecifier
directly.

I think I made this change, so I'll chime in.

From my understanding of the code, it appears that
ParseClassSpecifier (in ParseDeclCXX.cpp) now does all the parsing for
structs, unions, and C++ classes. This seems a little weird to me,
and it isn't clear from the comments in the Parse library that this is
the case.

That comment in ParseClassSpecifier is very C++-standards-wonk; I'll
be happy to make that clearer.

Specifically, I have two concerns:

(1) ParseTag now appears to be somewhat redundant and unnecessary, and
in some ways ill-named. It doesn't appear to serve its original
purpose anymore, and thus it's logic should probably just be folded
into ParseEnumSpecifier.

I agree.

(2) It seems strange to me that parsing structs and unions, core
pieces of the C language, is done in ParseClassSpecifier
(ParseDeclCXX.cpp). There seems to be a lot of commonality here that
could possibly be refactored and moved back into ParseDecl.cpp. This
is my first look at this code in a while, so I'm don't claim to
understand all of the design decisions here, but conceptually it just
seems a little confusing.

The C++ class-specifier (with elaborated-type-specifier) parses a
superset of the C struct-or-union-specifier, and in C mode
ParseClassSpecifier falls back to parsing struct-or-union-specifier.
It didn't make sense to me to put ParseClassSpecifier into the C
ParseDecl.cpp, because class-specifiers aren't a C concept.

While it's possible to make ParseClassSpecifier delegate to ParseTag,
I don't think it's the right approach for the future. ParseTag (which
parses the C concept of tags) would have to deal with some nontrivial
C++-specific parsing details for, e.g.,

  class X<int>::Y

Clang isn't parsing nested-name-specifiers or template-ids, so the
ParseClassSpecifier/ParseTag code looks a lot more similar now than it
would look if Clang had support for nested classes or class template
specializations. ParseTag could be made to deal with these C++-isms,
but I don't think it should be: let the non-trivial C++ parsing stay
in the C++ part of the parser.

Also, while the C99 struct-or-union-specifier and enum-specifier
nonterminals are similar enough to permit implementation with a single
ParseTag function, the C++ equivalents (class-specifier,
enum-specifier and elaborated-type-specifier) have more divergent
grammars, so it makes sense to keep ParseClassSpecifier and
ParseEnumSpecifier separate for C++. As more of C++ (and C++0x) gets
implemented, the common code between these two will be a much smaller
proportion of the actual parsing functions.

As a side note, my original interest in this code was to allow the
parser+actions to be able to distinguish up front between enum/struct/
union/class forward declarations and definitions.

FWIW, this requires arbitrary lookahead in C++ (but not in C), since
we can't tell the difference between a class-specifier and an
elaborated-type-specifier until we've parsed through the
nested-name-specifier and hit the actual class identifier (or
template-id). Of course, one could apply Argiris' pre-parser idea to
do the disambiguation between these two nonterminals in advance.

  - Doug

Doug,

Thanks a lot for chiming in...your comments make a lot of sense.

I guess if we want clang to have a unified parser for C/ObjC/C++, it's inevitable that some aspects of the parser will start feeling less C-like (particularly for structs/classes...). As you say, refining the comments may help. We could also consider coming up with a better name for the action hook (ActOnTag), however that's purely aesthetic.

snaroff

Hi Doug,

Thanks for the very lucid explanation. It seems to me that because of the commonality with the parsing logic for C that ParseClassSpecifier should be moved to ParseDecl.cpp. Completely C++-specific parsing logic that it calls (e.g., ParseCXXMemberSpecification) should be left in ParseDeclCXX.cpp.

My other thought is that "ActOnTag" should be renamed (as Steve suggested), and only be called for struct/unions/classes. I propose then that we also have a separate action for enums, since they really do represent a different part of the grammar once "ActOnTag" gets renamed. As Steve pointed out, it seems strange that ActOnTag is called from two different code paths in the parser.

Ted

Doug Gregor wrote:

  

As a side note, my original interest in this code was to allow the
parser+actions to be able to distinguish up front between enum/struct/
union/class forward declarations and definitions.
    
FWIW, this requires arbitrary lookahead in C++ (but not in C), since
we can't tell the difference between a class-specifier and an
elaborated-type-specifier until we've parsed through the
nested-name-specifier and hit the actual class identifier (or
template-id). Of course, one could apply Argiris' pre-parser idea to
do the disambiguation between these two nonterminals in advance.
  
In an attempt at parsing nested-name-specifiers, inside ParseClassSpecifier, the parser would parse the nested-name-specifier, get it resolved to a CXXScopeTy* by Sema, and pass this pointer to ActOnTag along with the other information (identifier etc.). That way there's no need for lookahead.

Doug Gregor wrote:

As a side note, my original interest in this code was to allow the
parser+actions to be able to distinguish up front between enum/struct/
union/class forward declarations and definitions.

FWIW, this requires arbitrary lookahead in C++ (but not in C), since
we can't tell the difference between a class-specifier and an
elaborated-type-specifier until we've parsed through the
nested-name-specifier and hit the actual class identifier (or
template-id). Of course, one could apply Argiris' pre-parser idea to
do the disambiguation between these two nonterminals in advance.

In an attempt at parsing nested-name-specifiers, inside ParseClassSpecifier,
the parser would parse the nested-name-specifier, get it resolved to a
CXXScopeTy* by Sema, and pass this pointer to ActOnTag along with the other
information (identifier etc.). That way there's no need for lookahead.

If I understood Ted correctly, what he's thinking of would require
lookahead... I'll elaborate below.

As a side note, my original interest in this code was to allow the
parser+actions to be able to distinguish up front between enum/struct/
union/class forward declarations and definitions.

Doesn't already the parser+actions distinguish them ? ActOnTag gets passed a
TagKind which is one of TK_Definition/TK_Declaration/TK_Reference.

I *think* Ted meant that he wants to be able to know before actually
parsing the struct-or-union-specifier (in C) whether it's going to
declare, define, or reference a tag. This can be done with just a
little bit of lookahead. In the C++ world, this would mean knowing
whether we're parsing a class-specifier or an
elaborated-type-specifier, and allow us to have different Parse
functions for these two nonterminals (which are actually quite
different, despite the ambiguity), rather than combining them into a
single ParseClassSpecifier (which is really
ParseClassSpecifierOrNonEnumElaboratedTypeSpecifier!).

  - Doug

Argiris is correct. My mind short-circuited. I believe that the TagKind should be enough for ActOnTag to take the appropriate action based on a definition/declaration/reference.

Thanks everyone!

Doug Gregor wrote:

As a side note, my original interest in this code was to allow the
parser+actions to be able to distinguish up front between enum/struct/
union/class forward declarations and definitions.

Doesn't already the parser+actions distinguish them ? ActOnTag gets passed a
TagKind which is one of TK_Definition/TK_Declaration/TK_Reference.
    
I *think* Ted meant that he wants to be able to know before actually
parsing the struct-or-union-specifier (in C) whether it's going to
declare, define, or reference a tag.

Ah, I misunderstood Ted's intentions. Ted, could you explain why is this necessary ?

-Argiris