Hi,
The attached patches implement support for nested-name-specifiers (foo::bar::x) on the Parser utilizing 'annotation tokens' (many thanks to Doug for the idea here: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002664.html)
About annotation tokens:
These are a special kind of tokens that the parser may use (not the lexer) to replace a stream of lexed tokens with a single one that encapsulates the relevant semantic information.
There are two kinds:
-typename annotation (represents a typedef name in C, and a possibly qualified typename in C++, like "foo::bar::myclass")
-C++ scope annotation (represents a nested-name-specifier, ("foo::bar::")
Annotation tokens contain a void* value that represents semantic information specific to the annotation kind (a TypeTy* for typename and CXXScopeTy* for scope) and the SourceRange of the tokens that they replaced.
As you can see in the attached "annot-token.patch" there were some changes to the Token class to support annotations but its size did not change.
The benefits of the annotation tokens are:
----- 1) Vastly simplified handling of nested-names.
In my previous attempts at nested-names, the main issue was how to keep track of the "C++ scope specifier state" in a way so that introducing nested-names, at "parsing contexts" that don't particularly care about nested-names, won't over-complicate things and cause a lot of code duplication for the parsing code. Here's an example on how annotation tokens handle that:
Assume that we have:
sizeof( foo::bar::x )
sizeof doesn't particularly care about nested names, it only wants to find out if it has a type or an expression and defer parsing to the appropriate parsing functions.
Here's how it works if "foo::bar::x" is a type.
-sizeof calls Parser::isDeclarationSpecifier
-Parser::isDeclarationSpecifier at the beginning calls Parser::AnnotateToken,
-Parser::AnnotateToken parses and resolves both the scope-spec and the typename and sets as current token an annotation type token that indicates the type
-Parser::isDeclarationSpecifier sees that the current token is an annotation type token and returns true since this is a declaration specifier
-sizeof calls Parser::ParseTypeName
-When execution reaches Parser::ParseDeclarationSpecifiers, it sees the annotation type token, takes the information from it and "consumes" it from the token stream.
Ok, how about if "foo::bar::x" is an expression ?:
-sizeof calls Parser::isDeclarationSpecifier
-Parser::isDeclarationSpecifier, at the beginning calls Parser::AnnotateToken,
-Parser::AnnotateToken parses the scope-spec, sees that 'x' is not a typename and sets as current token an annotation scope token for "foo::bar::" (which is followed by the 'x' identifier token)
-Parser::isDeclarationSpecifier sees that the current token is not a declaration specifier and returns false
-sizeof calls Parser::ParseExpression
-When execution reaches Parser::ParseCastExpression, the annotation scope token indicates a qualified-id expression which is handled by Parser::ParseCXXIdExpression
-Parser::ParseCXXIdExpression takes the information from the annotation scope token and calls Actions.ActOnIdentifierExpr by passing the 'x' identifier and the specific C++ scope that it should be a member of
The important thing to notice about the above is that nested-names didn't affect the parsing logic of contexts that don't directly deal with nested-names.
Sizeof didn't have to do some special check for nested names. If the expression was this:
sizeof( foo::bar:: )
The error would be reported by Parser::ParseCXXIdExpression, sizeof doesn't have to check for this too.
At this point you may think that the side-effects of Parser::isDeclarationSpecifier (changing the token stream) may lead to problems, but in practice, due to how tokens are used, this is highly unlikely.
The parser mostly deals with just what is the current token and how that affects the current parsing logic. It doesn't have some "long term token memory" that can be "unsynchronized" by changing the token stream.
----- 2) Efficient backtracking.
The ambiguity resolution parser can use annotation tokens to spare the Parser from having to re-parse nested-names.
The nested-names (and typenames) will be resolved by the tentative parser once and the normal parser will use the annotation tokens.
----- 3) While annotation tokens bring the most benefits for C++, they are also useful for C too.
Currently, a typename gets looked up twice, once in Parser::isDeclarationSpecifier and then in Parser::ParseDeclarationSpecifiers. By replacing the typename with an annotation token, a typename gets looked up and resolved only once.
Any comments are welcome!
-Argiris
nns-parser.patch (33.3 KB)
annot-token.patch (7.81 KB)