How to check the flow of the clang compiler with an example

Hi,
I would like to go through the clang compiler (the lexer and the parser) by compiling a small program like helloworld.c . I am using visual studio 2008 solution file for clang/llvm. Can an expert tell me how to proceed and the steps that I need to know to do this.
I am not able to figure out what is the starting program that does the lexer and parser stages of the compiler, so that I could put some breakpoints that will be hit when we try to compile a program like helloworld.
And I would like to know whats a “cc1”. I found it in the code at several places.

Thanks

kalyan ponnala wrote:

Hi,
I would like to go through the clang compiler (the lexer and the parser)
by compiling a small program like helloworld.c . I am using visual
studio 2008 solution file for clang/llvm. Can an expert tell me how to
proceed and the steps that I need to know to do this.
I am not able to figure out what is the starting program that does the
lexer and parser stages of the compiler, so that I could put some
breakpoints that will be hit when we try to compile a program like
helloworld.
And I would like to know whats a "cc1". I found it in the code at
several places.

'cc1' is the actual compiler. It's actually a part of clang.exe, but you
activate it with the '-cc1' switch. What you want to do is debug
clang.exe, but pass it the -cc1 so you can debug the compiler itself. If
you don't pass -cc1, you'll start the driver instead, which spawns the
compiler. You can get around that by debugging child processes, too, but
I think it's easier to just run clang -cc1.

There is a "Lexer" class which is used to read tokens from a source
file, and a "Parser" class which does the parsing. There's also a
"Preprocessor" object which handles preprocessor directives.

The important method in the Lexer class is the Lex() method. It
retrieves the next token. The parser calls this method every time it
wants a token.

The Parser object starts by parsing top-level declarations (i.e. ones at
global scope), which is initiated with the ParseTopLevelDecl() method.

If you set breakpoints on these methods, you can step through the Lexer
and Parser at your leisure.

Chip

Hi,
Thanks for the reply. I tried putting breakpoints at CIndex.cpp file in CIndex target near

Lexer Lex(SourceMgr.getLocForStartOfFile(BeginLocInfo.first),

CXXUnit->getASTContext().getLangOptions(),

Buffer.first, Buffer.first + BeginLocInfo.second, Buffer.second);

and another breakpoint at Parser.cpp inside clangparser target.

bool Parser::ParseTopLevelDecl(DeclGroupPtrTy &Result) {

Result = DeclGroupPtrTy();

if (Tok.is(tok::eof)) {

Actions.ActOnEndOfTranslationUnit();

return true;

}

When I tried to run the command clang -cc1 helloworld.c on the command line , it generates 5 diagnostic messages saying error: unknown type __int64 /64-bit time value/ and shows me different places inside stdio.h and crtdefs.h.
The program that I created in visual studio was helloworld.c

#include<stdio.h>
int main()
{
printf("\n helloworld");
return 0;
}

Could you tell me if the procedure that I used was wrong or is it something with the breakpoints or something else?

Thanks

CIndex is not part of the clang executable, but part of a dynamic library which can be used by clients to use some (but not all) of Clang’s functionality.

As for your command line, you need to pass the right target options to clang -cc1. First try:

clang -fsyntax-only helloworld.c -###

which will print out the ‘clang -cc1’ line for doing ‘-fsyntax-only’ on ‘helloworld.c’. Then invoke ‘clang -cc1’ in the debugger with that complete set of arguments.

kalyan ponnala wrote:

Hi,
Thanks for the reply. I tried putting breakpoints at CIndex.cpp file in
CIndex target near

Lexer Lex(SourceMgr.getLocForStartOfFile(BeginLocInfo.first),

            CXXUnit->getASTContext().getLangOptions(),

            Buffer.first, Buffer.first + BeginLocInfo.second,
Buffer.second);

No, no, you want to put the breakpoint on Lexer::Lex(). It's in the
clangLex target, in Lexer.h.

and another breakpoint at Parser.cpp inside clangparser target.

bool Parser::ParseTopLevelDecl(DeclGroupPtrTy &Result) {

  Result = DeclGroupPtrTy();

  if (Tok.is(tok::eof)) {

    Actions.ActOnEndOfTranslationUnit();

    return true;

  }

Good.

When I tried to run the command clang -cc1 helloworld.c on the command
line , it generates 5 diagnostic messages saying error: unknown type
__int64 /*64-bit time value*/ and shows me different places inside
stdio.h and crtdefs.h.

That is because you need to also pass -fms-extensions to use clang with
MS headers.

Chip

Thanks for the reply guys. I am able to run those commands correctly (I guess) but i am not able to see whats going on inside the lexer and parser as the debugger goes on with the helloworld.c program. I am not sure of what is happening there as it does not show any output.
I tried these commands:
clang -cc1 -fms-extensions helloworld.c -----it worked but no output --nothing happened
clang -cc1 -fsyntax-only helloworld.c -fms-extensions -x c helloworld.c

clang -fsyntax-only helloworld.c -###

etc etc…

I set the breakpoints as you said charles. what more should I do? The .c file is inside a visual studio project but I am accessing it through the command line. Is it fine? Do I have to add some commands or paths to the visual studio project file of helloworld.c ?

Thanks.

Thanks charles, ted, and yun. It works. I can step through the clang parser and lexer now.

I would like to know if we can add the -fms-extensions options to the “clang-test” target. I could not build the regression tests until now. It give me fatal errors saying " ‘stdio.h’ file not found… ‘stdlib.h’ not found etc."

Thanks.