Hey everyone
The rewriting API of Clang operates on the source code in textual form. The user can use AST nodes to figure out what to replace, but in the end he has to remove and insert snippets in a linear piece of text.
This is very inconvenient when it is required to restructure and nest replacements. The involvement of macros makes a manual process even more difficult. See some recent threads expressing difficulty with the API [1][2].
What do I mean by "nested replacements"? For example in the following:
int i = x \+ s\->a;
I would want to replace the BinaryOperator with a function call and the MemberExpr with some constant:
int i = Addition\(x, 7\);
When keeping the two replacement rules independent of each other, achieving this with the current API is extremely difficult. More so when macros are involved.
I am proposing some kind of helper that aims to solve these issues by providing an interface that offers to directly replace AST nodes and a mechanism to nest AST node replacements - without having to worry about macros.
Potential usage:
- Develop a class that derives from StmtToRewrite to define how replacements should happen:
class RewriteAdds : public cu::StmtToRewrite
\{
public:
std::string makeReplaceStr\(\) const override
\{
auto binOp = dyn\_cast<BinaryOperator>\(replaceS\);
return "Addition\(" \+ getMgr\(\)\->getReplaced\(binOp\->getLHS\(\)\)\.strToInsert \+ ", " \+
getMgr()->getReplaced(binOp->getRHS()).strToInsert + ")";
}
};
class RewriteMembs : public cu::StmtToRewrite
\{
public:
std::string makeReplaceStr\(\) const override
\{
return "7";
\}
\};
- Construct a RewriteManager:
cu::RewriteManager mgr\(ACtx, PP\);
- Add rewriting operations to the manager:
// std::vector<const Stmt \*> AddStmts = /\* matched from binaryOperator\(\) with plus \*/
// std::vector<const Stmt \*> MembStmts = /\* matched from memberExpr\(\) \*/
for \(const auto &S : AddStmts\) mgr\.registerStmt<RewriteAdds>\(S\);
for \(const auto &S : MembStmts\) mgr\.registerStmt<RewriteMembs>\(S\);
- Retrieve and apply the results:
clang::Rewriter rewriter\(SM, LangOpts\);
for \(const auto &r : mgr\.getReplacements\(\)\) \{
rewriter\.RemoveText\(r\.rangeToRemove\);
rewriter\.InsertText\(r\.rangeToRemove\.getBegin\(\), r\.strToInsert\);
\}
At the end of this mail is my low quality code that kind-of implements this. TLDR:
- Build a hierarchy of stmts to replace and keep track of which replacements must be combined
- Move further up in the AST if these replacements are inside a macro
- Recursively lex the file and look for replacements outside-in by spelling locations. Expand any macros that are encountered during this. The re-lexing idea is based on the hint in [3].
The code has the following shortcomings:
- I do not know how to distinguish macro argument expansions within macros. For example in "#define FOO(a) a + a" the two "a"s expand to different AST nodes that could be replaced with different rules. This is an important issue, because it can lead to completely broken code with nesting.
- Limited to Stmts, when Decls should be supported too.
- Very un-optimized with lexing the entire source file many times. Easy to solve, but didn't want to raise the complexity further for now.
- Could keep written code more clean by only expanding macros if required. For example not required if just a macro arg is replaced and all expansions would be the same.
I am very interested in your general thoughts. I'm not very experienced with clang, but this was my vision how I would want to do replacements. Are you interested in getting this into clang? I would need help with ironing out the remaining issues.
-Rafael
[1] http://lists.llvm.org/pipermail/cfe-dev/2018-July/058430.html
[2] http://lists.llvm.org/pipermail/cfe-dev/2018-June/058213.html
[3] http://lists.llvm.org/pipermail/cfe-dev/2017-August/055079.html