preprocessor hacking

Hi All,

I am new to clang (llvm). As an hacking exercise, I’d like to experiment to start doing it in the preprocessor. I have an active project where I have my own ‘middle-processor’ and so far I work with various compiler with the following pipe.

cpp | my-middle-processor | cc

More realistically the real pipe is more like

cc -P hi.c | my-middle-processor > hi.i ; cc hi.i

Basically my-middle-processor does simple things that can’t be done by cpp itself, yet don’t need to knowledge of what it scan (no C parser needed).

I can still work this way with clang, but as an exercise I’d like to see if I can glue by text processing inside the clang cpp.

May be later, I could move this recognition as a real ‘c extension’, then bring back the cpp it is actual behavior, and recognise the input construct in the C grammar.

So now I successfully buillt a clang+llvm -g and I am able to run GDB on it, and the question is, what is the function name I should put a breakpoint on when cleang cpp is getting its first char.

I tried fread, read, mmap, but those are never catched.

Now things to know I am totally c++ iliterate, I do all my work in C, so may be I should do something like b class:func but got no idea of what class name could be.

Any help appreciated.

​Discovered the -f-no-color-diagnostics

Still have to discover a good function name to setup a break point to get the lexer control.


If you invoke clang as a driver (as in clang test.c) it will spawn a new process for the frontend (that lexes, preprocesses, parses etc.) and a new process for linker and assembler. To debug the frontend itself (and not the driver) pass -cc1 (as in clang -cc1 test.c).

Not sure where you’d like to break, but have a look at Lexer.cpp and Preprocessor.cpp and pick a method. Constructor might be a good place to start…