How to get the rewritten code string in Clang AST?

zao · October 7, 2023, 10:38pm

Hi, I’m using Clang AST to parse C source code and rewrite AST nodes. In EndSourceFileAction() method of ASTFrontendAction class, I wish to get the rewritten code (not just the AST nodes that got rewritten, but the whole C source code after rewriting) and assign it to a string variable. Is there any way to do so?

I tried to use getRewrittenText() in Rewriter.h. It requires a SourceRange as an argument but I cannot find how to get the range of a rewriter.

AaronBallman · October 9, 2023, 7:44pm

I would recommend looking at https://github.com/llvm/llvm-project/blob/1684c65bc997a8ce0ecf96a493784fe39def75de/clang/lib/Frontend/ASTConsumers.cpp#L31 – this may not work directly for you, but this is the interface that implements the -ast-print frontend option. It’s worth noting that the output is best-effort, so it may produce incorrect code or somewhat strange output.

steakhal · October 10, 2023, 7:43pm

@AaronBallman Why is it considered best effort?
Do you know of current limitations or examples?

AaronBallman · October 11, 2023, 12:37pm

The -ast-print option was originally added as a way to see if we kept source fidelity with our AST, basically as a debugging aid. However, it also predated clang-format and so there were some thoughts of “should this be our pretty printer?”. Eventually we landed on clang-format instead.

It’s best-effort because we don’t want to force people to update the printers (or the dumpers) when adding or modifying AST nodes, except beyond the bare minimum. It’s a hard argument to make that someone’s review should be blocked on -ast-print output, but if we got -ast-print to the point it had production-quality output, then we could start to do so.

In terms of examples of what best-effort looks like, this shows a few problems: Compiler Explorer (the attribute switched to being a type attribute, which would break code; the ctor and dtor have explicit markings for the template type). We do pretty good with the output, but it’s not 1:1 with what the user wrote in all cases.

steakhal · October 11, 2023, 12:51pm

Thanks for this info.
If I’m interested in the limitations in depth, do you know an effective way of uncovering discrepancies? In other words, if I don’t really care about attributes like in your example, are there other things broken/missing?

To give an example of the motivation why I’m asking is that users might want to “hash” the string provided by the pretty printer to get whitespace and preprocessor insensitive hash for decls and stmts. This already uncovered a crash like this. However, if it wouldn’t crash, it would likely just silently increase the chance of hash collisions, that are hard to discover.

AaronBallman · October 11, 2023, 1:12pm

Oh I’m sure there’s plenty more that’s broken, missing, or just not quite the same as what the user input. To uncover issues, you could probably write a script that takes an input source file, runs it through -ast-print to get the output, and then diff the results ignoring whitespace differences (we intend to output the same code as the source but it doesn’t have to be the same formatting) to see what changes exist. This won’t work for code using the preprocessor though: Compiler Explorer

Another approach would be to pretty print to a file and then see if that new file still compiles.

Topic		Replies	Views
Reparse rewritten source Clang Frontend	2	72	August 8, 2016
Bug 11806 Clang Frontend	5	62	October 8, 2012
source-to-source transformation: how-to? Clang Frontend	2	80	February 19, 2008
Using clang for source to source transformations Clang Frontend	1	79	February 12, 2010
preferred way to translate and output source code? Clang Frontend	2	139	October 31, 2018

How to get the rewritten code string in Clang AST?

Related Topics