Using the Lexer...

I'm trying to use the Lexer, but am running into an issue. My code
essentially looks like this:

clang::Token Tok;
clang::Lexer Lexer(...);

Lexer->Lex(Tok);
while (Tok.isNot(clang::tok::eof)) {
   // logic based on Tok.is(...) checks, without touching Lexer or any
other clang objects
   Lexer->Lex(Tok);
}

This should work, right?

When I run this, it gets each token fine until the point where it's
supposed to get the EOF token. Instead, it just crashes with the
following info:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000078
0x001fb455 in clang::Token::is (this=0x6c, K=clang::tok::eof) at Token.h:84
84 bool is(tok::TokenKind K) const { return Kind == (unsigned) K; }

(gdb) bt
#0 0x001fb455 in clang::Token::is (this=0x6c, K=clang::tok::eof) at Token.h:84
#1 0x000b0c29 in clang::PTHLexer::getEOF (this=0x0, Tok=@0xbfffeec8)
at PTHLexer.cpp:177
#2 0x000adfdd in clang::Preprocessor::HandleEndOfFile
(this=0x1d04490, Result=@0xbfffeec8, isEndOfMacro=false) at
PPLexerChange.cpp:232
#3 0x00098d25 in clang::Lexer::LexEndOfFile (this=0xbfffece0,
Result=@0xbfffeec8, CurPtr=0x1d04267 "") at Lexer.cpp:1221
#4 0x0009a727 in clang::Lexer::LexTokenInternal (this=0xbfffece0,
Result=@0xbfffeec8) at Lexer.cpp:1302
#5 0x001d5758 in clang::Lexer::Lex (this=0xbfffece0,
Result=@0xbfffeec8) at Lexer.h:128

Is this a bug or am I doing something wrong?

-Alexei

It depends on the "..." in the lexer set up. If you want to just use the lexer without using a preprocessor then you need to set it up to lex in "raw" mode. Otherwise it will crash at EOF and when trying to expand macros. clang/Driver/DiagChecker.cpp has code that uses the raw lexer to pull out comments from a file, I'd use it as an example.

-Chris

I'm using a preprocessor and passing in the correct FileID from the
SourceManager.

The preprocessor is created in the same way that I make it as when I
make it to compile and codegen the code (which works right).

-Alexei

I'm using a preprocessor and passing in the correct FileID from the
SourceManager.

The preprocessor is created in the same way that I make it as when I
make it to compile and codegen the code (which works right).

Ok, if you're using a preprocessor and you want it to expand macros etc, then the preprocessor has to own the lexer. You should not create a lexer on the stack, you should do what the clang driver does.

-Chris

Ok, if you're using a preprocessor and you want it to expand macros etc,
then the preprocessor has to own the lexer. You should not create a lexer
on the stack, you should do what the clang driver does.

Ah. Shouldn't there be an assert for this or something?

-Alexei

You haven't shown me enough code to know what you're doing.

-Chris

Well obviously I wasn't using it right - I'm just saying it would be
helpful (for people just starting with the API) if this was detected
rather than waiting to crash.

Basically what was happening was I was creating the Lexer on the stack
and passing in the preprocessor as a parameter, rather than using the
Preprocessor to lex like the driver does.

Since EnterMainSourceFile() was never called, CurLexer was never set
in PPLexerChange.cpp, and when Preprocessor::HandleEndOfFile() was
called it assumed since CurLexer was NULL, that there was a
CurPTHLexer and tried to use that, resulting in a crash. This would
only happen when we got to the EOF since before that everything was
running fine.

I'm just suggesting that since such a use is clearly wrong, this
should be detected early on. Perhaps in the constructor of Lexer that
takes the Preprocessor parameter. Or when trying to get a token
without first calling EnterMainSourceFile(). Or some other even better
place.

-Alexei

I guess I meant to send this reply to the list...

Hi Alexei,

I completely agree with you. Unfortunately, clang is moving really really fast right now and things are changing rapidly as we work to finish off C/ObjC and bring up C++ in parallel. Much documentation is still needed in many areas, hopefully this will improve as clang matures.

-Chris

I'd be happy to add an assert. Can you please suggest one that would have caught this?

-Chris