Need Help Printing Preprocessed Code and Modifying CFG

Hello all,

I’m working with a Clang plugin (it must be a plugin for kernel compilation reasons) that analyzes drivers in the Linux kernel for virtualization security purposes.

I start by modifying the body of a function in a way that certain statements are deleted. Then I need to print out that modified source code so that it can be compiled and preferably human-readable. However, I want to expand all macros so that there are minimal external symbols in the generated code. I tried using the Rewriter class for this, but the Rewriter apparently prints out original source code only and cannot expand macros… is there an alternative source printer that can print the preprocessed code?
For reference, when calling “printPretty()” on the function’s body Stmt*, the macros are expanded just like I want.

My second question is about actually deleting those certain statement nodes in a CFG. I remember reading somewhere in the Clang docs that the CFG is a constant data structure that can’t be modified. Is there a way to delete statements from the underlying AST so that they wouldn’t appear in the CFG and also not appear in the re-printed source code? To be clear, I’m using the CFG for other analysis reasons, but I don’t really care if the Stmt nodes are deleted from the AST directly or from inside a CFG basic block.

Thanks in advance,
Kevin Boos
Rice University Ph.D. Candidate

Attempting source to source transformations this way (via the ast/cfg) is generally not recommended. These devices are not intended (and thus don’t work very well) for this purpose.

What’s the goal of transforming the source in this way? If you’re trying to instrument/test the resulting program behavior, consider an approach like the sanitizers (address/memory/thread sanitizer) that instruments code within clang’s generation phase. If you’re attempting to rewrite code, consider something like the ASTMatchers & tooling library.

David,

Thanks, I’ll take a look at ASTMatchers.

The purposes of the analysis is to generate a subset of the source, where this subset is effectively a slice of statements that affect a given CallExpr (with certain extra statements added into that subset). I realize that slicing is better achieved with LLVM; I have already done this. This needs to be at the source level. My problem is merely how to delete a statement or other type of node from an AST or CFG, and then re-write the preprocessed source as output.

Thanks,
Kevin

I’ve looked into the ASTMatcher’s abilities, those are very nice but not terribly useful for me.

I think I’ve narrowed down what I really need from Clang – the ability to delete a note from the AST/CFG and then output that modified AST as source code.

I know that I can printPretty() the source code of an AST or use the Rewriter to generate the code. So basically I just need to know how to remove a Stmt or other node from the AST/CFG.

Sorry for the multiple followups.

Thanks again,
Kevin

Hi Kevin,

I've looked into the ASTMatcher's abilities, those are very nice but not
terribly useful for me.

I think I've narrowed down what I really need from Clang -- the ability
to delete a note from the AST/CFG and then output that modified AST as
source code.

I know that I can printPretty() the source code of an AST or use the
Rewriter to generate the code. So basically I just need to know how to
remove a Stmt or other node from the AST/CFG.

Although this is not usually recommended, it is possible to remove nodes of the AST, for instance using TreeTransform. If your node is part of a sequence of statements, you basically build a new vector of statements -- without the nodes you don't need -- in your own version of TransformCompoundStmt, and then rebuild the underlying tree using RebuildCompoundStmt. There has been a thread about that earlier this year.

Regards,

Hi Kevin,

Responding to just the comment, the CFG should be viewed as a data structure that is lazily constructed from the AST. It is a “view” on the AST, but it shouldn’t be viewed as something you can mutate. Also, the CFG may contain AST elements not in the original AST (particularly DeclStmts) for purposes of easier analysis. This latter part is something we may wish to change in the future by adding new CFGElements, but that’s something that is currently done now.

Ted

Hi Ted,

Thanks for the reply. Now I think I understand the relationship between the AST and CFG. I won’t attempt to directly modify the CFG, which I had previously assumed to be immutable due to all the const usage throughout its methods.

So, if I was examining a CFGBlock, for example, I could access the underlying AST Stmt nodes and operate on them.

Now, once I have access to the AST node, what is a good way to remove it from the AST? (Ideally in a clean manner)… I assume this is possible because I have seen several prior works that use Clang for refactoring, and many refactoring operations involve deleting or at least replacing/reordering nodes.

Best,
Kevin

Hi Ted,

Thanks for the reply. Now I think I understand the relationship between the
AST and CFG. I won't attempt to directly modify the CFG, which I had
previously assumed to be immutable due to all the const usage throughout its
methods.

So, if I was examining a CFGBlock, for example, I could access the
underlying AST Stmt nodes and operate on them.

Now, once I have access to the AST node, what is a good way to remove it
from the AST? (Ideally in a clean manner).... I assume this is possible
because I have seen several prior works that use Clang for refactoring, and
many refactoring operations involve deleting or at least
replacing/reordering nodes.

Refactoring tools (such as the ASTMatchers) are generally encouraged
to do so by using the AST's source locations to modify the original
source code text, not by mutating the AST & generating new source.

(as someone's already mentioned, some kinds of AST mutation can be
done with things such as TreeTransform - but it's not usually
recommended as a tool for doing refactoring (TreeTransform exists
mostly for handling template instantiation))

Hi Kevin,

I think David addressed this point pretty well. If you goal is source-to-source translation, there is no need to modify the original AST. Instead, generate “edits” to the original source by deleting the old source range and inserting new source code. You can then emit edits + original source. There are several examples of rewriters like this in the Clang tree, such as the Objective-C rewriter, etc., that take this approach. The nice thing about this approach is that it preserves comments, and essentially all elements of the original source that you didn’t need to touch.

Ted

Oh, I see. So that’s the best way to perform changes – sort of lexical edits to the source code itself, not edits to the AST data structures. Something along the lines of the Rewriter’s “RemoveText” method.

Thanks to you, David, Beatrice, and everyone for your help. I really appreciate your patience and advice.

Regards,
Kevin