segfault in lexing

I am using the lexer to get a stream of tokens for a file. However if there is a nested block comment,

for eg /* some comment /This is a nested comment/ some more comment */

Then the lexer gives a segmentation fault. Is there any way to avoid this segmentation fault?

The code that i am using is:

clang::DiagnosticOptions diagnosticOptions;
clang::TextDiagnosticPrinter *pTextDiagnosticPrinter = new clang::TextDiagnosticPrinter( llvm::outs(), diagnosticOptions);
llvm::IntrusiveRefCntPtrclang::DiagnosticIDs pDiagIDs;
clang::Diagnostic diagnostic(pDiagIDs,pTextDiagnosticPrinter);

clang::LangOptions languageOptions;
clang::FileSystemOptions fileSystemOptions;
clang::FileManager fileManager(fileSystemOptions);

clang::SourceManager sourceManager( diagnostic, fileManager);
clang::HeaderSearch headerSearch(fileManager);

clang::TargetOptions targetOptions;
targetOptions.Triple = llvm::sys::getHostTriple();

clang::TargetInfo *pTargetInfo =
clang::TargetInfo::CreateTargetInfo( diagnostic, targetOptions);

clang::Preprocessor preprocessor( diagnostic, languageOptions, *pTargetInfo, sourceManager, headerSearch);

const clang::FileEntry *pFile = fileManager.getFile( _outFilename);
sourceManager.createMainFileID(pFile);
preprocessor.EnterMainSourceFile();

std::vector cmdvect;
GeneralCommand cmd;
clang::Token token;
do {
preprocessor.Lex(token);
if( diagnostic.hasErrorOccurred())
{
break;
}

} while(token.isNot(clang::tok::eof));

And the backtrace is:

Program received signal SIGSEGV, Segmentation fault.
0x00000000004708fc in clang::LangOptions::LangOptions (this=0x7fffffffbc38) at /home/harsh/Desktop/llvm/tools/clang/lib/Sema/…/…/include/clang/Basic/LangOptions.h:24
24 class LangOptions {
(gdb) bt
#0 0x00000000004708fc in clang::LangOptions::LangOptions (this=0x7fffffffbc38) at /home/harsh/Desktop/llvm/tools/clang/lib/Sema/…/…/include/clang/Basic/LangOptions.h:24
#1 0x00000000007502c2 in clang::Lexer::Lexer (this=0x7fffffffbbb0, fileloc=…, features=…,
BufStart=0x12a32cc8 “\n//#include <sys/wait.h>\n\n\n//#include <stdio.h>\n//#include <stdlib.h>\n//#include <errno.h>\n//#include <string.h>\n//#include <fcntl.h>\n//#include <sys/types.h>\n//#include <sys/stat.h>\n//#include <unist”…,
BufPtr=0x12a338c4 "/printf(“God mode wasted \n”)/;\n/\n\tswitch (gbuf[0]) \n\t\t{\n\t\t\tcase(‘h’) :\n\t\t\t\tread(0,gbuf,6);\n\t\t\t\twrite(harpipe[1],gbuf,6);\n\t\t\tcase(‘t’) :\n\t\t\t\tread(0,gbuf,6);\n\t\t\t\twrite(turpipe[1],gbuf,6);\n\t\t\tdefault"…, BufEnd=0x12a3411a “”) at Lexer.cpp:117
#2 0x0000000000750ae8 in clang::Lexer::MeasureTokenLength (Loc=…, SM=…, LangOpts=…) at Lexer.cpp:349
#3 0x0000000000744116 in clang::TextDiagnosticPrinter::EmitCaretDiagnostic (this=0x12a0ab30, Loc=…, Ranges=0x7fffffffc010, NumRanges=0, SM=…, Hints=0x0, NumHints=0, Columns=0, OnMacroInst=0,
MacroSkipStart=0, MacroSkipEnd=0) at TextDiagnosticPrinter.cpp:382
#4 0x0000000000746a4d in clang::TextDiagnosticPrinter::HandleDiagnostic (this=0x12a0ab30, Level=clang::Diagnostic::Warning, Info=…) at TextDiagnosticPrinter.cpp:1041
#5 0x0000000000921c04 in clang::DiagnosticIDs::ProcessDiag (this=0x0, Diag=…) at DiagnosticIDs.cpp:582
#6 0x000000000091cc03 in clang::Diagnostic::ProcessDiag (this=0x7fffffffd410) at /home/harsh/Desktop/llvm/tools/clang/lib/Basic/…/…/include/clang/Basic/Diagnostic.h:608
#7 0x000000000091b4a2 in clang::DiagnosticBuilder::Emit (this=0x7fffffffc460) at Diagnostic.cpp:225
#8 0x0000000000421ee2 in clang::DiagnosticBuilder::~DiagnosticBuilder (this=0x7fffffffc460, __in_chrg=)
at /home/harsh/Desktop/llvm/tools/clang/lib/Sema/…/…/include/clang/Basic/Diagnostic.h:696
#9 0x000000000075369d in clang::Lexer::SkipBlockComment (this=0x129db190, Result=…,
CurPtr=0x12a338c5 "printf(“God mode wasted \n”)/;\n/
\n\tswitch (gbuf[0]) \n\t\t{\n\t\t\tcase(‘h’) :\n\t\t\t\tread(0,gbuf,6);\n\t\t\t\twrite(harpipe[1],gbuf,6);\n\t\t\tcase(‘t’) :\n\t\t\t\tread(0,gbuf,6);\n\t\t\t\twrite(turpipe[1],gbuf,6);\n\t\t\tdefault "…) at Lexer.cpp:1629
#10 0x0000000000754fd8 in clang::Lexer::LexTokenInternal (this=0x129db190, Result=…) at Lexer.cpp:2248
#11 0x00000000007101e8 in clang::Lexer::Lex (this=0x129db190, Result=…) at /home/harsh/Desktop/llvm/tools/clang/lib/Frontend/…/…/include/clang/Lex/Lexer.h:131
#12 0x00000000007102d5 in clang::Preprocessor::Lex (this=0x7fffffffc650, Result=…) at /home/harsh/Desktop/llvm/tools/clang/lib/Frontend/…/…/include/clang/Lex/Preprocessor.h:507
#13 0x00000000007c2c84 in MyRewriter::getLexerTokens (this=0x7fffffffd750) at MyRewriter.cpp:304
#14 0x00000000007c2847 in MyRewriter::getParseTrees (this=0x7fffffffd750) at MyRewriter.cpp:263
#15 0x00000000007bdf26 in GeneralCandidate::parse (this=0xff09160) at GeneralCandidate.cpp:15
#16 0x00000000007e3a10 in main () at CPPMain.cpp:1111

Harsh Gupta

I am using the lexer to get a stream of tokens for a file. However if there is a nested block comment,

for eg /* some comment /This is a nested comment/ some more comment */

Then the lexer gives a segmentation fault. Is there any way to avoid this segmentation fault?

It’s hard to tell without more information, you probably don’t have diagnostics set up right. Note that this is not a nested block comment, C doesn’t have any such thing. If you run this through ‘clang -E’ you’ll see this warning:

t.c:1:24: warning: ‘/*’ within block comment [-Wcomment]

which is probably being mishandled.

-Chris