case tok::identifier:
// eats the current identifier and related type stuff as a whole,
D = ConsumeAndResolveIdentifier();
if (Actions.isTypeName(D))
... it's a decl spec ...
else
.. it's an 'identifier expression' ..
Is that the problem?
Yes, exactly! I want to avoid the "leading stuff.." route.
I've been going back and forth on this issue and did a bit of research.
GCC's C++ front-end handles this in a couple of different ways. To handle the ambiguity between declarations and statements at the top level of compound statements, it basically boils down to:
cp_parser_parse_tentatively (parser);
/* Try to parse the declaration-statement. */
cp_parser_declaration_statement (parser);
/* If that worked, we're done. */
if (cp_parser_parse_definitely (parser))
return;
/* Look for an expression-statement instead. */
statement = cp_parser_expression_statement (parser, in_statement_expr);
I don't think there is a good way around this (in its full generality), other than some "peephole" look-ahead to try to disambiguate without fully tentative parsing. This is worthwhile because the code above would run for simple things like { x = y; } because looking at 'x=' you can obviously tell whether it starts a declaration or not.
However, I think that the primary expression case can be better and is more important. The GCC parser (afaict) does a ton of tentative parsing and other weird stuff when trying to resolve identifiers in primary expression and I think this is extremely bad news.
Consider some very common C++ cases:
std::cout << ....
std::string X;
... std::string::npos ...
std::string("")
std::vector<int>::iterator
etc
The big problem that I have with backtracking is that use of it to handle primary expressions means that you do a huge amount of the same work, over and over again. In this case, analyzing "std::string::npos" will probably start tentatively parsing it as a type (because it could be the start of a 'constructor style cast'. This will require lookup of 'std', then finding the 'string' typedef in it, then finding the definition of the string type (a record_decl for basic_string<char...>) then finding the npos member, then seeing the '(', then realizizing that npos is not a type and aborting/backtracking. In the 'retry' it will start parsing it again, which has to resolve 'std', 'string' and 'npos' all over again, just to continue on as a non-type. It is far worse for things like vector<int> because it requires looking up the template and handling partial specialization *multiple times* which is slow slow slow 
In the 'Old Clang C parser' we had a special case for identifier, and we had "parse expression with leading identifier". You convinced me that you can make the preprocessor fast enough at single token lookahead to allow us to depend on it and simplify the implementation. For C, this is sufficient, but for C++ it clearly is not.
Given the complexities of these C++ issues, I'm strongly inclined to go back to the old design for expressions (however, we should still rely on 1-token lookahead for labels). This means that we resolve the qualified identifier streams exactly once in this case, and don't have to re-resolve them and don't have have backtracking in this (very, very, common case). This does bring back some annoyingness in the clang implementation of the grammar, but it comes into a very simple place that doesn't require constant hacking (the primary/postfix/cast part of the grammar is pretty simple).
Places in the code that use the "is decl spec" predicate will also have to be changed (e.g. the compound statement case). I think it would be interesting to consider making the 'is decl spec' predicate either be tri-state (yes/no/maybe) or see if the callers of this can be made to inline the two cases directly, like we can do for primary expressions. I haven't looked at the clients of 'is decl spec' to see how insane this idea is.
It also means that isDeclarationSpecifier would need backtracking and significant other stuff to work as a predicate. In practice, this means we'd want to refactor all callers to not use it like it does.
Here's another way to make it work, without any backtracking:
Either Parser or Sema, keeps a scope-spec state (like Parser keeps a 'current scope' state and Sema keeps 'current DeclContext' state), which indicates the Decl that we need to do lookup into.
As you suggested, the Parser, calls actions during parsing of scope specifiers.
For "A::T<int>::":
-Parses 'A::' and calls ActOnNestedNameSpecifier for 'A' (updates scope-spec)
-Parses, instantiates, etc. 'T<int>' through action calls (updates scope-spec)
Ok. I like that this means the resolution only has to happen once, which is my goal.
Sema actions like isTypeName and ActOnIdentifier, check the scope-spec to see whether it needs to lookup a name inside the scope-spec decl, or do a normal lookup.
This is interesting: doesn't this mean that everything else will have to check to make sure that this hasn't happened, in order to reject invalid code?
Now, the questions is whether Sema should keep the scope-spec state, or whether the Parser should keep the scope-spec state and pass it along to the actions (isTypeName, etc.)
When I say passing the scope-spec to actions, I don't mean like the CXXScopeSpec in my patch which contained only parsed tokens, I mean passing the decl that represents the scope where names should be looked up into.
Ok.
If the Sema keeps the scope-spec state, it is cleaner, but how will the scope-state be cleared in case of an error like "A:: ;" ?
Should the parser call an ActOnErrorAfterNestedName or something ?
That is possible. I'm more concerned with the fact that Sema for ';' will have to check to make sure there is no current scope-spec that is active. This approach seems to make it so that *all* sema actions would have to check for an empty scope-spec 
This also means that we can drop the rewinding part if the Parser or Sema keep and look after a 'C++ scoping' state.
If the Parser keeps the state, it will build and pass 'CXXScopeSpec' objects to action methods, (like in my patch).
If the Sema keeps the state, it will make error recovery awkward.
I actually think that things are more awkward with having Sema work this out. Specifically, if you have: "A :: B :: C" you might have one reference or it might be "A::B ::C" which can occur in a few places in the grammar:
http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elsa/doc/coloncolon.txt
If the parser just passed all of A/B/C to Sema, I'm not sure how it would handle this. It seems really easy if you invoke actions for "A::" then "B" (which returns a type). When the parser got a type, it would naturally ratchet on and parse ::C as a qualified name.
But "B" type could contain "C" as nested type, in which case you may want "A:
:C".
Anyway, I don't think we need to worry about whether it should be "A::B ::C" or "A :
:C" or whatever.
GCC treats it as "A:
:C" always:
struct S {};
namespace A { S f(); }
S :
:f(); // error: 'S::A' has not been declared
I think this is correct since the spec says nothing about "'::' associativity".
Sure, but this can apparently happen in the 'friend' case and some others, according to the Elsa doc. I confess to not being an expert on these matters though.
-Chris