Transforming the AST

Hello list,

i need some help with AST transformations. I searched through the api
but didn't find anything straight forward except some build*type functions.
How can i create a new stmt and insert it in the tree? I don't want to
use the Rewriter for the transformations.

Thank you,
Zakkak Foivos

If the statements you are building are relatively simple, you can just create the Stmt/Expr nodes and hook them into the AST. However, you need to build semantically well-formed statements (including all necessary implicit conversions) if you're going to do it this way, and that isn't trivial.

The other way to approach this problem is to use the ActOn* methods of Sema, so that Sema will check the semantics of the statements you create, and then hook the resulting statements/expressions into the AST.

  - Doug

If you're interested in producing transformed source code, you should really
use the rewriter (and tell us if that's insufficient for some reason). If you're
interested in producing transformed machine code, you can either follow one
of Doug's suggestions or, if the code you want to emit is pretty straightforward,
you can alter the LLVM IR after CodeGen is done with it.

John.

Thank you both for your replies.

I am trying to make a source-to-source compiler. The reason i would
prefer to skip the rewriter is that it supports only text. I guess i
will have to use the prettyprinter to produce the final text.
Also is it possible for rewriter to create a new file and write in to
it? Because this is going to be the next step after the transformations.

Foivos

i need some help with AST transformations. I searched through the api
but didn't find anything straight forward except some build*type functions.
How can i create a new stmt and insert it in the tree? I don't want to
use the Rewriter for the transformations.

If you're interested in producing transformed source code, you should really
use the rewriter (and tell us if that's insufficient for some reason). If you're
interested in producing transformed machine code, you can either follow one
of Doug's suggestions or, if the code you want to emit is pretty straightforward,
you can alter the LLVM IR after CodeGen is done with it.

John.

Thank you both for your replies.

I am trying to make a source-to-source compiler. The reason i would
prefer to skip the rewriter is that it supports only text.

If you're writing a source-to-source compiler, use the rewriter. It will preserve comments and code structure so that the resulting source is readable.

I guess i
will have to use the prettyprinter to produce the final text.

Oh, don't do that. The pretty printer will not always produce well-formed code and, even if it did, it would produce very ugly code that bears little resemblance to the input source. The pretty printer can be useful for small things, e.g., build a new Stmt or Expr to represent a transformed expression, pretty-print it into a buffer, and then replace the original text with the pretty-printed text using the rewriter.

Also is it possible for rewriter to create a new file and write in to
it?

Yes, of course. You can write the results anyway you want.

  - Doug

Apparently a lot of people create source-to-source compilers on top of clang.

John McCall schrieb:

If you're interested in producing transformed source code, you should really
use the rewriter (and tell us if that's insufficient for some reason).

One thing I've failed to get it right so far is a minimal transparent rewrite:
Suppose, the AST of a function is given. Then you manipluate that AST (insert, remove, clone statements, all the like).
After all is done you want to write it back but of course preserve unchanged source code parts.
While this sounds easy at first (just track the changed parts of the AST) the devil is in the details. Thus currently I rewrite complete functions only (and wouldn't promote that incomplete AST processing approach here as it is incomplete). IIRC a random access RopePieceBTreeIterator could help a lot (and at least a forward random access is easy to implement).

Best regards
Olaf Krzikalla

Whenever you manipulate the AST, you could make the same (textual) transformation via the Rewriter. Then the Rewriter becomes responsible for storing all of the changes.

If you find a better solution, we'd love to hear about it!

  - Doug

Hi @clang,

Douglas Gregor schrieb:

Whenever you manipulate the AST, you could make the same (textual) transformation via the Rewriter. Then the Rewriter becomes responsible for storing all of the changes.
  

This doesnt work if you change a part of an AST that was already changed before (and thus rewritten) because you have no correct source loc info anymore. Suppose you inline a function via an AST transformation and later on perform some loop unrolling on that inlined function statements. To rewrite that properly you need the source loc infos of the original AST.

If you find a better solution, we'd love to hear about it!
  

Well, as I said in my mail before, it's not a solution yet:
Before I start the AST processing I store the tree structure of the original AST.
Then the first idea was to compare the completely transformed AST with that stored structure and rewrite the changes on the tree level they appear (IMHO this is still the correct approach).
But the function dealing with the level comparision grew bigger and bigger as more special cases popped up and eventually became unmanageable. Hence I decided to simply propagate changes one level up to the appropriate parent and thus rewrite the parent statement. However this means that just one introduced local variable at function scope leads to a completely rewritten function body. Thus for the moment I've canceled all efforts in this direction and just rewrite the function body (I don't even use parent propagation). It works for my puposes but it is a flawed solution.

Best regards
Olaf Krzikalla