Strategy to modify the raw buffer in the first phase (beginner)


I want to do my first modification of clang with a simple case (at least from my point of view). What I expect from this thread? At least one of this points:

    1. General overview of the process I should do (eg. you have to alter the Lexer and Parser: modify X,Y methods of class Z, then expand the class J with support of the X’s new output).
    1. To be pointed to some classes/methods/functions that are connected with what I want to achieve (eg. your problem is similar to achieving ‘<<< >>>’ for Cuda, read the files X, Y, Z related with CUDA).
    1. Alternative strategies (eg. you will suffer doing it in this way, much better expanding the Lexer and avoiding to touch the CodeGen).

As you can imagine I wrote stupid (eg.) examples.

My objective is being able to transform every code like this:

“A: #{A} B: #{B}\n”

to this:

"A: " << A << " B: " << B << “\n”

So, in cases like this:

std::cout << “#{var}\n”;

will make sense:

std::cout << var << “\n”;

But others like:

printf("#{var}\n"); auto x = “my string#{x}”;


printf(var << “\n”); auto x = “my string” << x;

But I don’t mind. It is a “simple example” to start learning how to modify clang.

So, my initial strategy would be to receive a string (the source code) and transform it as raw text, using regexp or not (not important).

I think that I should receive the source code, maybe just before the preprocessor, do the transformations to every source code (eg. I receive the source code per file) and just leave it to the next phase of the compiling process.

Any recommendation? (steps 1, 2 and 3 that I posted above).

Thanks in advance,


This thread is opened after solving the problem of compilation speed. Now is affordable :slight_smile: Thanks to all in “Too much time to compile clang. Suggestions for a starter?”.