[RFC] Easier source-to-source transformations with clang tooling

Hi all,

I have a proposal for a framework that makes it easier to write source to source transformations with the clang::Tooling libraries, including clang-tidy checks.

The full proposal is in this doc:

https://docs.google.com/document/d/1ppw0RhjwsrbBcHYhI85pe6ISDbA6r5d00ot3N8cQWeQ/edit?usp=sharing

From the doc:
Transformer is a framework that aims to simplify development of clang-based source-to-source transformations. It focuses on the particular class of transformations that act only locally — that is, use local information about the code and make local changes (like a syntax-aware “find-and-replace”); and at scale — that is, will be carried out on many source files. The target audience is users that are comfortable with, or willing to become comfortable with, the clang AST matchers library.

I have a working prototype of this library which I’ve used on small examples inside Google. I plan to put together a patch for reference next week, although the doc should stand on its own.

Thanks!
Yitzhak Mandelbaum

After some delay, I’ve created a diff to accompany the doc. In particular, here’s the Stencil library, which provides for generating code in a format-string style:

https://reviews.llvm.org/differential/diff/182553/

Please see the tests for various examples and the doc for more explanation.

Hi Yitzhak,

your code looks very interesting.

It might be a good fit for uitlity that clang-refactor can use, too. (not sure about the state of clang-refactor though)
I would be interested how your framework integrates into clang-tidy, do you have a reference check that you ported to your approach (i have seen some references to clang-tidy code)?

P.S. It might be a good idea to make your diff a full revision, as right now the discussion feature can not be used.

Best, Jonas

Thanks! I’ve added it to a revision (https://reviews.llvm.org/D56933), but I wasn’t sure what lists to subscribe it to (if any) given that I don’t intend it yet for proper review.

I haven’t ported check, but do have a demo check (using Transformer as a whole) that I can add. I’ll ping the thread when that’s done.

Hi Yitzhak,

I’m still not seeing (and I’m not saying it’s not there, just that I’m not seeing it :slight_smile: where this is significantly simpler to use than string concatenation. Is there an example in the tests that you’d say exemplifies that?

Hi Yitzhak,

Hi all,

I have a proposal for a framework that makes it easier to write source to source transformations with the clang::Tooling libraries, including clang-tidy checks.

The full proposal is in this doc:

https://docs.google.com/document/d/1ppw0RhjwsrbBcHYhI85pe6ISDbA6r5d00ot3N8cQWeQ/edit?usp=sharing

I just took a quick pass on this document, and it looks interesting. However, I’m not very familiar with this area of clang, and I have a few high-level questions to be sure I understand the objectives of your tool.

My understanding is that the input of a transformation using your tool is a clang AST and the output is new source code. Is that correct?

If so, it seems your tool is optimized for (but not functionally limited to) transformations involving a single pass where the ultimate product is source code. That is, after running a transformation pass, if you want to either run additional transformation passes or perform LLVM IR codegen, you would need to serialize the output as source code and run clang again to get the new AST. Is that correct?

Thanks.

Joel

Hi Joel,

Yes, both are correct. However, clang already has support for serializing the result as source (actually, it’s more like applying your patch to the original source) so that’s pretty easy. We’re also planning to work on making that process of rerunning clang smoother and more efficient (e.g. some form of incremental recompilation to AST), but that’s still off in the future.

Hi Joel,

Yes, both are correct. However, clang already has support for serializing the result as source (actually, it’s more like applying your patch to the original source) so that’s pretty easy. We’re also planning to work on making that process of rerunning clang smoother and more efficient (e.g. some form of incremental recompilation to AST), but that’s still off in the future.

Thanks for explaining. Can you say more about what you’re thinking for incremental recompilation and how far off that might be?

Joel

Hi,

one question that came to my mind because of Joel’s Mail:

Is it possible to run in multiple passes?
My use-case is adding const to variables that could be ‘const’ but aren’t. If you declare multiple variables like int not_const, const_variable;
the transformation requires splitting these up, first → int not_const; int const_variable; and then do the const transformation → int not_const; const int const_variable;.

I have implemented both transformations in clang-tidy (only partially landed yet) but there is no easy way I can run them in one check. The current workflow
pretty much forces us to run clang-tidy multiple times and converge to the final solution.

If your framework could give an improvement in this place, would be awesome! And I think worth to consider anyway If we change the way
we do transformations.

Best, Jonas

Hi,

one question that came to my mind because of Joel’s Mail:

Is it possible to run in multiple passes?
My use-case is adding const to variables that could be ‘const’ but aren’t. If you declare multiple variables like int not_const, const_variable;
the transformation requires splitting these up, first → int not_const; int const_variable; and then do the const transformation → int not_const; const int const_variable;.

I have implemented both transformations in clang-tidy (only partially landed yet) but there is no easy way I can run them in one check. The current workflow
pretty much forces us to run clang-tidy multiple times and converge to the final solution.

If your framework could give an improvement in this place, would be awesome! And I think worth to consider anyway If we change the way
we do transformations.

Generally, you can already do this, but it’d be outside clang-tidy. You can write a clang tool that
a) runs over the codebase and stores the “const graph”
b) load the “const graph” into memory and generate all the replacements (there’s a way to create yaml replacements)
c) apply all the replacements

Hi Yitzhak,

I realize I’m a bit of a luddite even asking, but how tied is Stencil to the AST matchers? It seems it would be useful for replacements based on good old AST traversals too.

  • Kim

Thanks for clarification. Having it directly in clang-tidy would be nice though :slight_smile:

Manuel,

Good question. I didn’t originally intend Stencil to be a stand-alone library, but as an integral part of Transformer. It is only as I’ve worked on it that I realized that it may in fact have value on its own. So, I’m still thinking this through. Here’s my logic:

  1. Abstraction level. The matchers work at the abstraction level of node identifiers. Stencil complements the matchers with string concatenation that operates as the same level. This allows users to write simple transformations without ever using the AST API. They need only understand the matchers and the string-concatenation model of Stencil.

  2. AST operations. For users that need more than simple transformations, Stencil also provides a (growing) assortment of AST operations. Ultimately, I’d like to collect all relevant AST codegen operations in the Stencil library. We could provide an API that works only on AST nodes, so we don’t need the string-concat functionality, but the combination makes the abstraction far more compelling, because you can do significant work without ever needing to learn how to extract pieces of source text with the AST API.

  3. clang-query integration. Stencil defines a mini-DSL, so we can naturally integrate it into clang-query. This will support writing (some) refactorings as clang-query scripts, as well as a richer tool for interactively exploring new refactoring ideas.

Examples

The tests are not great examples since they are designed as unit tests for single features rather than demos. That said, the test MemberOpWithNameOp combines two features in an interesting way:

TEST_F(StencilTest, MemberOpWithNameOp) {
const std::string Snippet = R"cc(
int object;
int* method = &object;
(void)method;
return object;
)cc";
auto StmtMatch = matchStmt(
Snippet, declStmt(hasSingleDecl(
varDecl(hasInitializer(expr().bind(“e”))).bind(“d”))));
ASSERT_TRUE(StmtMatch);
auto Stencil = Stencil::cat(member(“e”, name(“d”)));
EXPECT_THAT(toOptional(Stencil.eval(StmtMatch->Result)),
IsSomething(Eq(“object.method”)));
EXPECT_THAT(Stencil.addedIncludes(), testing::IsEmpty());
EXPECT_THAT(Stencil.removedIncludes(), testing::IsEmpty());
}

A slightly more realistic example involves maps. For a map s, change s.count(k) in a boolean context to s.contains(k):

Matcher:
castExpr(hasCastKind(clang::CK_IntegralToBoolean),
hasSourceExpression(cxxMemberCallExpr(
on(expr().bind(“s”),
anyOf(hasType(FlatHashSetType), hasType(pointsTo(FlatHashSetType)))),
callee(cxxMethodDecl(hasName(“count”))),
hasArgument(0, expr().bind(“k”)))))

The code gen is:
Stencil::Cat(Member(“s”, “contains”), “(”, Node(“k”), “)”)
and this handles both the s.count and s->count cases.

Jonas, Joel,

I would like to eventually support easy pipelining of transformations. Jonas – to your example, I would like for you to be able to express both transformations as Transformer rules and then ask Transformer to serialize them into one combined rule that can be used as a clang-tidy. As for how and when:

  • how: based on discussions with Sam McCall, it seems like we might be able to integrate with Clang’s “preamble” support (used by libclang/clangd) or (more forward looking) to integrate with the modules support. Either one should allows us to efficiently recompile from a change. A crazier idea would be to consider an alternative AST which is designed for efficient modification. However, this latter approach would involve a larger community discussion.

  • when: that very much depends on who’s interested in working on the problem. On my own, I don’t think I’ll get to it before late March.

Kim,

Not a luddite question at all! I’m still exploring Stencil’s relevance outside the context of Transformer. I haven’t written any AST traversals myself, so I can’t claim any special knowledge of that would fit with Stencils, but I could imagine it might still be useful. Can I ask what advantages you’d see in that context over string concatenation? I’m thinking that without bound names, you lose the first and third selling point I mentioned in my response to Manuel to this question, but point 2 still stands. We’d just to add a new operation to the stencil generators to inject an ast node pointer/reference into the stencil. e.g. we could add overloads to Node() for clang::Stmt, clang::Decl, etc. I’d love to get your take, though.

Hi Yitzhak,

Jonas, Joel,

I would like to eventually support easy pipelining of transformations. Jonas – to your example, I would like for you to be able to express both transformations as Transformer rules and then ask Transformer to serialize them into one combined rule that can be used as a clang-tidy. As for how and when:

  • how: based on discussions with Sam McCall, it seems like we might be able to integrate with Clang’s “preamble” support (used by libclang/clangd) or (more forward looking) to integrate with the modules support. Either one should allows us to efficiently recompile from a change.

Interesting. I have only a passing familiarity with these representations.

I know that some parts of Sema build data structures while parsing a particular scope and then destroy those data structures when leaving that scope. That seems to mean that, without significant refactoring of Sema, there is some non-trivial minimum granularity at which incremental compilation could run because it has to rebuild such data structures. For example, I’d be surprised if, in general, it’s possible to compile part of a function without recompiling the entire function.

Weren’t the preamble and module representations originally designed for precompiled headers? Does that mean their granularity is, at smallest, the size of a file scope declaration? Actually, using those representations, would you be able to recompile part of a file without recompiling the remainder of the file?

It would be cool if this strategy turns out to be efficient enough for practical source-to-source use cases.

A crazier idea would be to consider an alternative AST which is designed for efficient modification. However, this latter approach would involve a larger community discussion.

Probably so. But that should offer finer-grained modifications.

Thanks!

Joel

Ok, so if I understand you correctly, you’re saying that we can abstract away differences in the AST that you’d previously have to handle yourself with common library functions. That definitely makes sense, cool!

Jonas,

I’ve updated the revision (https://reviews.llvm.org/D56933) with two example clang tidies ported to Transformer. These are not ideal checks, because they are very particular in their handling of whitespace, but I think they’re good demonstrations both of Transformers strengths and its (current) limitations. Additionally, below are to versions of simple example that isn’t a tidy check yet. Please let me know what you think.

Also, what are the next steps towards considering these libraries for acceptance to clang? Should I put together a diff(s) for review, or is that still premature?

// Simplify e.emplace_back(std::make_pair(e1, e2)) to e.emplace_back(e1, e2).
// We elide the extra matching required to check that e is in a container like std::vector and that
// the element type is the same as that constructed by the call to make_pair.

using extra_matchers::isFunctionNamed;
using extra_matchers::isMethodNamed;

// Version 1
// This version has three metavariables – one for the container and two for the
// arguments to make_pair.
ExprId Call, Arg0, Arg1;
auto MakePairCall =

callExpr(callee(isFunctionNamed(“::std::make_pair”)),
argumentCountIs(2), hasArgument(0, Arg0.bind()),
hasArgument(1, Arg1.bind()));
auto Rule1 = RewriteRule()
.matching(cxxMemberCallExpr(
callee(isMethodNamed(“emplace_back”)),
argumentCountIs(1), hasArgument(0, bind(Call, make_pair_call))))

.change(Call)
.replaceWith(Arg0, “,”, Arg1));

// Version 2
// In this version, we extract the source code of the arguments using the arg()
// stencil operator on the node bound to Call.
auto Rule2 = RewriteRule()
.matching(cxxMemberCallExpr(
callee(isMethodNamed(“emplace_back”)), argumentCountIs(1),
hasArgument(0, bind(Call, callExpr(callee(isFunctionNamed(
“::std::make_tuple”)))))))
.change(Call)
.replaceWith(tooling::stencil_generators::args(Call)));

Thanks!
Yitzhak

Hi Yitzhak,

you had positive reactions from Manuel, so I think that the proposed API is in principle interesting.
The API should go into libTooling? And I assume it will be parallel to what we have already?

Having patches on phabricator is probably a good idea. But I am not an authorative figure in the libTooling space,
in the end the code-owner (manuel?!) has the power to deny entry. IMHO its a good direction.

Best, Jonas

Yes, it’s targeted at libTooling. I’ll also have a few adaptors to land in clang-tidy, but that will be a simple diff once the main libraries are committed. I’ll follow up w/ Manuel and other libTooling folks.

thanks!