Preprocessor::LookNext

Hi,

The "nested-name-specifier '::' " syntax occurs at enough places that it would really simplify the code if an efficient Parser::GetLookAheadToken(1) could be used.

I attached a Preprocessor::LookNext implementation for this (pp-looknext.patch).
In the "parse-labeled" patch I use LookNext to determine a label and use a separate Parser::ParseLabeledStatement.

Let me know what you think.

-Argiris

pp-looknext.patch (2.11 KB)

parse-labeled.patch (4.23 KB)

Hi Argiris,

Is there a specific problem with calling LookAhead(0)? To me, this is a performance optimization tweak... I'd rather do this sort of thing when more of the C++ front-end is in place. This would give us better grounds for profiling and determine whether it is actually a win in the end.

Adding a "LookNext()" method to the preprocessor as a helper for LookAhead(0) would be ok if you think it would be significantly useful though. Also, LookAhead could obviously be improved :).

Finally, are you sure it would be completely unreasonable to refactor the C++ grammar to avoid the look-ahead in the common case? While always having peektok available is nice, it adds significant cost to the hotest part of the C preprocessor. I would expect a significant slowdown from the patch on large .i files with -Eonly for example (which spends almost all time in this code).

-Chris

Chris Lattner wrote:

Is there a specific problem with calling LookAhead(0)? To me, this is a performance optimization tweak... I'd rather do this sort of thing when more of the C++ front-end is in place. This would give us better grounds for profiling and determine whether it is actually a win in the end.

Yes, it is only a performance optimization tweak; I wanted to find out if it is acceptable to optimize just LookAhead(0) (which, by the way, is currently the only use for LookAhead).

Adding a "LookNext()" method to the preprocessor as a helper for LookAhead(0) would be ok if you think it would be significantly useful though. Also, LookAhead could obviously be improved :).

[....] While always having peektok available is nice, it adds significant cost to the hotest part of the C preprocessor. I would expect a significant slowdown from the patch on large .i files with -Eonly for example (which spends almost all time in this code).

With a slight modification, the PeekedToken check can be avoided for the common case:

  void Lex(Token &Result) {
    if (CurLexer)
      CurLexer->Lex(Result);
    else
      CurTokenLexer->Lex(Result);
    else {
      // We have a peeked token that hasn't been consumed yet.
      Result = PeekedToken;
      ConsumedPeekedToken();
      return;
    }
  }

LookNext would cache CurLexer,CurTokenLexer before setting them to null, and ConsumedPeekedToken would restore them.

are you sure it would be completely unreasonable to refactor the C++ grammar to avoid the look-ahead in the common case?

I'll go ahead with GetLookAhead; after I post a patch and the code is there for reviewing, it will be a lot easier to discuss about this.

-Argiris

Chris Lattner wrote:

Is there a specific problem with calling LookAhead(0)? To me, this is a performance optimization tweak... I'd rather do this sort of thing when more of the C++ front-end is in place. This would give us better grounds for profiling and determine whether it is actually a win in the end.

Yes, it is only a performance optimization tweak; I wanted to find out if it is acceptable to optimize just LookAhead(0) (which, by the way, is currently the only use for LookAhead).

Ok.

[....] While always having peektok available is nice, it adds significant cost to the hotest part of the C preprocessor. I would expect a significant slowdown from the patch on large .i files with -Eonly for example (which spends almost all time in this code).

With a slight modification, the PeekedToken check can be avoided for the common case:

void Lex(Token &Result) {
   if (CurLexer)
     CurLexer->Lex(Result);
   else if (CurTokenLexer)
     CurTokenLexer->Lex(Result);
   else {
     // We have a peeked token that hasn't been consumed yet.
     Result = PeekedToken;
     ConsumedPeekedToken();
     return;
   }
}

LookNext would cache CurLexer,CurTokenLexer before setting them to null, and ConsumedPeekedToken would restore them.

I like this approach about 10 times better than the other one :slight_smile: it is very clever and simple! The good thing about this is that it is an easy thing to measure. Can you whip up a patch that implements this approach? I don't think it will have a measurable performance impact on the preprocessor (unlike the other approach). If so, I would be very happy to commit it and simplify the (existing and future) parser to use it.

-Chris

Chris Lattner wrote:

I like this approach about 10 times better than the other one :slight_smile: it is very clever and simple! The good thing about this is that it is an easy thing to measure. Can you whip up a patch that implements this approach? I don't think it will have a measurable performance impact on the preprocessor (unlike the other approach). If so, I would be very happy to commit it and simplify the (existing and future) parser to use it.

The attached "pp-looknext-2.patch" implements Preprocessor::LookNext(), and "parser-nexttoken.patch" modifies the parser to use it.

Let me know if they look ok, with no performance impact.

-Argiris

parser-nexttoken.patch (10.8 KB)

pp-looknext-2.patch (2.72 KB)

The difference is *just barely* measurable in my -Eonly worst case. It is close enough and good enough to go in, thanks!

Some comments about the patch (pp first):

I incorporated the suggestions and committed the patches here:

http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20080707/006444.html
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20080707/006445.html

Chris Lattner wrote:

+++ lib/Lex/PPLexerChange.cpp (working copy)
@@ -85,6 +85,10 @@
   // size we new to a multiple of 16 tokens. If the previous buffer has space
   // left, we can just grow it. This means we only have to do the new 1/16th as
   // often.
+
+ // Optimized LookAhead(0) case.
+ if (N == 0)
+ return LookNext();

   Token *LookaheadTokens = new Token[N+1];

Is this correct for the LookAhead(1) case?

I don't quite understand what you mean, LookAhead(1) isn't affected by this change, can you elaborate ?

-Argiris

I incorporated the suggestions and committed the patches here:

Thanks!

+++ lib/Lex/PPLexerChange.cpp (working copy)
@@ -85,6 +85,10 @@
  // size we new to a multiple of 16 tokens. If the previous buffer has space
  // left, we can just grow it. This means we only have to do the new 1/16th as
  // often.
+
+ // Optimized LookAhead(0) case.
+ if (N == 0)
+ return LookNext();

  Token *LookaheadTokens = new Token[N+1];

Is this correct for the LookAhead(1) case?

I don't quite understand what you mean, LookAhead(1) isn't affected by this change, can you elaborate ?

I haven't looked at the code, I just want to make sure that LookAhead(1) or LookAhead(2) doesn't get off-by-one due to the peeked token.

-Chris

Chris Lattner wrote:

+++ lib/Lex/PPLexerChange.cpp (working copy)
@@ -85,6 +85,10 @@
  // size we new to a multiple of 16 tokens. If the previous buffer has space
  // left, we can just grow it. This means we only have to do the new 1/16th as
  // often.
+
+ // Optimized LookAhead(0) case.
+ if (N == 0)
+ return LookNext();

  Token *LookaheadTokens = new Token[N+1];

Is this correct for the LookAhead(1) case?

I don't quite understand what you mean, LookAhead(1) isn't affected by this change, can you elaborate ?

I haven't looked at the code, I just want to make sure that LookAhead(1) or LookAhead(2) doesn't get off-by-one due to the peeked token.

LookAhead isn't called recursively, it collects all tokens through Lex(), into the LookaheadTokens array.
If a peeked token exists, like when calling "LookAhead(0); LookAhead(1);", it goes like this:

LookAhead(0);

There's a peeked token now.

LookAhead(1);

-LookAhead consumes through Lex() the peeked token
-Peeked token is invalidated, Lex() will not use it again.
-LookAhead consumes one more token by 'normal' Lex().
-Previously peeked token plus one more enter the token stream.
-Previously peeked token will be the next available token.

So, everything works as expected :slight_smile:

-Argiris

ok, great! :slight_smile:

-Chris