Chris Lattner wrote:
Chris Lattner wrote:
Okay, here's another crazy idea. If you boil it down, my objections to preparsing are basically:
1. the perf cost of having to do the prepare in *every* decl case.
2. [minor] the perf cost for qualified expr cases (std::cout << ...)
3. [minor] the maintenance cost of the second parser.
1) This is not true, as I explain in this post:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002625.html
Either I'm very confused (likely!) or your current patch doesn't do this. As I mentioned in the other post, it looks like it runs the preparser for any identifier at top level. It even runs it for the "x = 4" case.
In the second patch posted here:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002617.html
No backtracking is enabled for the common cases:
Parser::TentativeParsingResult Parser::isCXXDeclarationStatement() {
.......
TentativeParsingResult TPR = isCXXDeclarationSpecifier();
if (TPR != TPR_ambiguous)
return TPR;
TentativeParsingAction PA(*this);
TPR = TryParseSimpleDeclaration();
PA.Revert();
....
}
isCXXDeclarationSpecifier(), does at most a one token lookahead. (actually, for 'typeof' it tentatively skips through it to see if "typeof(..)" is followed by '(' but this is too uncommon to even discuss it).
If the current token does not indicate a type, isCXXDeclarationSpecifier() does no token consumption.
2) This is not inherent to the preparser, even if there's a "tentatively parse decl then parse as expr" approach, we still prefer to do such resolutions once; This perf cost needs to be solved in either case.
By using a "parse expr with leading qualified name" approach, or with something else?
Here's what I have in mind.
-something like "A::" indicates a scope qualifier.
-there's a parser method with a purpose to resolve scope qualifiers, say "ParseCXXScopeQualifier", it parses them, calls sema actions to resolve them and returns a CXXScopeTy* from Sema (this is used to pass to sema actions that will need it).
-ParseCXXScopeQualifier can cache that CXXScopeTy* result, so that when it is called again for the same token source location, it will return the cached result without doing any sema resolution at all. It will also skip the necessary number of tokens (if it was previously called with "A::", it will skip 4 tokens).
Now say that a statement starts with this:
-The preparser sees that 'A::' is a scope qualifier and calls ParseCXXScopeQualifier to do its thing. Then calls Parser::IsTypeName. Both methods cache their results.
-The preparser sees that "A::T" is not followed by a '(', backtracks, and returns "it's a declaration"
-The normal parser sees that 'A::' is a scope qualifier and calls ParseCXXScopeQualifier which just returns the previously cached result and skips 4 tokens.
-The normal parser calls Parser::IsTypeName which also just returns the previously cached result.
-sema resolutions are only done once
What do you think ?
PS: Here's a weird test I did. I used the preparser to see how many "ambiguous" declarations (of T(...) style) there are in actual C code. I used a few of GCC files:
gcc.c: declarations in functions: 432 ambiguous: 0
expr.c declarations in functions: 730 ambiguous: 0
combine.c declarations in functions: 564 ambiguous: 0
I think this suggests that it's a really uncommon case.
If anyone wants to give me a preprocessed file for a test, please do!
-Argiris