Patch: AST Matcher Framwork and an example tool

+cfe-dev, -cfe-commits

As requested by Chris I have reverted the Tooling stuff, and am bumping this thread on cfe-dev.

Please find the original mail proposing the patch sent to cfe-commits on May 23rd inlined below.

References to the original threads:
Proposal: http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110523/042163.html
Commit: http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110530/042375.html

Patches produced by the example tool sent out (has not got feedback yet):
Clang: http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110523/042212.html
LLVM: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110523/121387.html

More context on why this is useful: when writing code transformation tools to do large scale changes on libraries with clang, we noticed that a lot of the code we wrote was “unrolled” matching of tree structures on the AST, which required a lot of dynamic casting and deeply nested control flow structures.
The library we implemented based on those findings allows a very declarative way of expressing interesting parts of the AST that you want to work on, with a callback mechanism to have an imperative part that does the actual transformation. The way this library uses templates to build up a reverse hierarchy of the clang AST allows us to expand the library of matchers without needing to manually copy the structure of the AST while still providing full compile time type checking.
We have used this library for many internal transformations (one more broadly applicable of them being included as an example tool), and Nico may be able to explain more how he’s using the infrastructure for Chromium.

The future work (as outlined to some extend in the original email) is to build up code transformation tools (including the standard refactoring tools we know from other languages) and libraries to make it easy for developers to contribute both generally useful tools and make it easy to write one-off tools for cleanup transformations on larger code bases.

I would love this to be part of clang mainline, as the expertise and feedback of the community has been very helpful and valuable.

ast-matchers.patch (239 KB)

Ping?

+cfe-dev, -cfe-commits

As requested by Chris I have reverted the Tooling stuff, and am bumping this thread on cfe-dev.

Hi Manuel,

Sorry for the delay, I’ve been swamped with WWDC and trying to catch up on a huge backlog caused by it.

Please find the original mail proposing the patch sent to cfe-commits on May 23rd inlined below.

References to the original threads:
Proposal: http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110523/042163.html
Commit: http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110530/042375.html

Patches produced by the example tool sent out (has not got feedback yet):
Clang: http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110523/042212.html
LLVM: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110523/121387.html

More context on why this is useful: when writing code transformation tools to do large scale changes on libraries with clang, we noticed that a lot of the code we wrote was “unrolled” matching of tree structures on the AST, which required a lot of dynamic casting and deeply nested control flow structures.

Ok, I agree that our current infrastructure for pattern matching ASTs is… ehh… “suboptimal” ;-). I’m really thrilled you guys are working on improving this.

The future work (as outlined to some extend in the original email) is to build up code transformation tools (including the standard refactoring tools we know from other languages) and libraries to make it easy for developers to contribute both generally useful tools and make it easy to write one-off tools for cleanup transformations on larger code bases.

This goal is really really fantastic, something that I’ve been hoping that Clang would grow into from the very beginning.

The library we implemented based on those findings allows a very declarative way of expressing interesting parts of the AST that you want to work on, with a callback mechanism to have an imperative part that does the actual transformation. The way this library uses templates to build up a reverse hierarchy of the clang AST allows us to expand the library of matchers without needing to manually copy the structure of the AST while still providing full compile time type checking.
We have used this library for many internal transformations (one more broadly applicable of them being included as an example tool), and Nico may be able to explain more how he’s using the infrastructure for Chromium.

My primary concern with this work is that it is built as a series of compile-time code constructs. This means that someone working on a rewriter needs to rebuild the clang executable to do this. Instead of doing this as a compile-time construct, have you considered building this as an extensible pattern matching tool where the pattern and replacement can be specified at runtime? I’m envisioning something like a “regular expression for an AST”. You don’t need to rebuild grep every time you want to search for something new in text.

Even in the case where compiled-in code is required, having a more dynamic matcher would greatly simplify the code doing the matching. Have you considered this approach?

-Chris

+cfe-dev, -cfe-commits

As requested by Chris I have reverted the Tooling stuff, and am bumping this thread on cfe-dev.

Hi Manuel,

Sorry for the delay, I’ve been swamped with WWDC and trying to catch up on a huge backlog caused by it.

I figured :slight_smile: That’s why I waited with pinging until post-WWDC…

Please find the original mail proposing the patch sent to cfe-commits on May 23rd inlined below.

References to the original threads:
Proposal: http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110523/042163.html
Commit: http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110530/042375.html

Patches produced by the example tool sent out (has not got feedback yet):
Clang: http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110523/042212.html
LLVM: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110523/121387.html

More context on why this is useful: when writing code transformation tools to do large scale changes on libraries with clang, we noticed that a lot of the code we wrote was “unrolled” matching of tree structures on the AST, which required a lot of dynamic casting and deeply nested control flow structures.

Ok, I agree that our current infrastructure for pattern matching ASTs is… ehh… “suboptimal” ;-). I’m really thrilled you guys are working on improving this.

The future work (as outlined to some extend in the original email) is to build up code transformation tools (including the standard refactoring tools we know from other languages) and libraries to make it easy for developers to contribute both generally useful tools and make it easy to write one-off tools for cleanup transformations on larger code bases.

This goal is really really fantastic, something that I’ve been hoping that Clang would grow into from the very beginning.

The library we implemented based on those findings allows a very declarative way of expressing interesting parts of the AST that you want to work on, with a callback mechanism to have an imperative part that does the actual transformation. The way this library uses templates to build up a reverse hierarchy of the clang AST allows us to expand the library of matchers without needing to manually copy the structure of the AST while still providing full compile time type checking.
We have used this library for many internal transformations (one more broadly applicable of them being included as an example tool), and Nico may be able to explain more how he’s using the infrastructure for Chromium.

My primary concern with this work is that it is built as a series of compile-time code constructs. This means that someone working on a rewriter needs to rebuild the clang executable to do this. Instead of doing this as a compile-time construct, have you considered building this as an extensible pattern matching tool where the pattern and replacement can be specified at runtime? I’m envisioning something like a “regular expression for an AST”. You don’t need to rebuild grep every time you want to search for something new in text.

Even in the case where compiled-in code is required, having a more dynamic matcher would greatly simplify the code doing the matching. Have you considered this approach?

Yes, we’ve considered this approach early on. We looked into both some Java and C-based solutions (see for example http://nighthacks.com/roller/jag/; for Java there are really really bad examples that match Java to XML and do xpath queries). The problem is that building that pattern matcher language would not be straight-forward (simply writing C++ with a few globs would not be enough for our current use cases, since C++ has a lot of implicit stuff going on which we want to match, and just creating an arbitrary new language doesn’t necessarily seem better than the in-language DSL).

Considering that we want to eventually get a dynamic pattern matching language, but we also want to get it right, we are currently spending our time on the in-language DSL, and especially for the large scale stuff the developers we work with need surprisingly little help (the included example for replacing reduncant c_str() calls was created by a contributor who’s not worked on the implementation of the matcher library).

And when we get to the higher-level refactoring tools, the dynamic aspect will be parameters to the refactoring, so the non-dynamic nature of the AST matchers does not matter for that case.

When we look at the actual transformations, being in C++ again provides the benefit that we can just work with the AST nodes we matched instead of having to define some new way of dynamically specifying the transformations - and re-binding the AST in a dynamic language is definitely out-of-scope for us…

In the end, I agree that the vision to have a really nice dynamic description of the matches is the ultimate goal, but for us this is currently still a few quarters out. The C++ code provides really useful abstractions to quickly describe matches and transformations on the AST, with little code (as we can use C++ to provide the type safety and thus the error checking on the AST nodes). The cost of a link-step while writing the tools has so far not been a big obstacle, especially considering that our main target users currently are a) C++ experts doing large scale code transformations and b) writing refactoring tools that end users can use without any knowledge about the AST.

Cheers,
/Manuel

The library we implemented based on those findings allows a very declarative way of expressing interesting parts of the AST that you want to work on, with a callback mechanism to have an imperative part that does the actual transformation. The way this library uses templates to build up a reverse hierarchy of the clang AST allows us to expand the library of matchers without needing to manually copy the structure of the AST while still providing full compile time type checking.
We have used this library for many internal transformations (one more broadly applicable of them being included as an example tool), and Nico may be able to explain more how he’s using the infrastructure for Chromium.

My primary concern with this work is that it is built as a series of compile-time code constructs. This means that someone working on a rewriter needs to rebuild the clang executable to do this. Instead of doing this as a compile-time construct, have you considered building this as an extensible pattern matching tool where the pattern and replacement can be specified at runtime? I’m envisioning something like a “regular expression for an AST”. You don’t need to rebuild grep every time you want to search for something new in text.

Even in the case where compiled-in code is required, having a more dynamic matcher would greatly simplify the code doing the matching. Have you considered this approach?

Yes, we’ve considered this approach early on. We looked into both some Java and C-based solutions (see for example http://nighthacks.com/roller/jag/; for Java there are really really bad examples that match Java to XML and do xpath queries). The problem is that building that pattern matcher language would not be straight-forward (simply writing C++ with a few globs would not be enough for our current use cases, since C++ has a lot of implicit stuff going on which we want to match, and just creating an arbitrary new language doesn’t necessarily seem better than the in-language DSL).

I’m not suggesting that you parse “C++ with holes in it” as the pattern matching language. It would be perfectly acceptable to represent the patterns as S expressions for example. You currently use this sort of thing at compile time:

ConstructorCall(HasDeclaration(Method(HasName(StringConstructor))),
                ArgumentCountIs(2),...

This is basically already s expressions, you could either use this sort of thing unmodified if you think it is nice looking, or convert to proper s exprs, as in:

(constructorcall (hasdeclaration (method (hasname stringconstructor)))
(argumentcountis 2) …

The point is to make the matching language have really simple and trivial syntax, but syntax that is usable without rebuilding the compiler. There are tons of examples of tree pattern matches (including things like Burg) to draw inspiration from.

Considering that we want to eventually get a dynamic pattern matching language, but we also want to get it right, we are currently spending our time on the in-language DSL, and especially for the large scale stuff the developers we work with need surprisingly little help (the included example for replacing reduncant c_str() calls was created by a contributor who’s not worked on the implementation of the matcher library).

I’m not sure I get this logic. You’re saying that you’re afraid you won’t get the matching language right, so you’d avoiding it and doing something you know is wrong ;-). I expect much iteration on this, but all that requires is to tell people to expect breakage as you get experience with it and evolve things.

And when we get to the higher-level refactoring tools, the dynamic aspect will be parameters to the refactoring, so the non-dynamic nature of the AST matchers does not matter for that case.

Well, the second step is to be able to specify rewrites dynamically the same way you specify predicates. In the case of a “real” refactoring engine, you’ll probably want the power of a scripting language or something to write your predicates in.

When we look at the actual transformations, being in C++ again provides the benefit that we can just work with the AST nodes we matched instead of having to define some new way of dynamically specifying the transformations - and re-binding the AST in a dynamic language is definitely out-of-scope for us…

I see the convenience in this, but still think it is the wrong way to go.

In the end, I agree that the vision to have a really nice dynamic description of the matches is the ultimate goal, but for us this is currently still a few quarters out. The C++ code provides really useful abstractions to quickly describe matches and transformations on the AST, with little code (as we can use C++ to provide the type safety and thus the error checking on the AST nodes). The cost of a link-step while writing the tools has so far not been a big obstacle, especially considering that our main target users currently are a) C++ experts doing large scale code transformations and b) writing refactoring tools that end users can use without any knowledge about the AST.

Beyond being “the wrong way to go” IMHO, there are several other problems with the code as proposed:

  1. It doesn’t following the LLVM coding standards, particularly around naming, using std::map<std::string, using C headers like <assert.h>, and a bunch of other stuff.

  2. You’re building substantial new functionality into clang. The clang binary is already overly bloated with the static analyzer and other functionality that it keeps accreting . It would be better to use (and improve) the clang plugin mechanism to build this as a plugin. I’d also like the analyzer to move off to being a plugin as well. One carrot that we can give for people to build things as plugins is that they can use C++'0x features even though the clang compiler has to stay C++'98 for the forseeable future.

  3. The tooling infrastructure adds python stuff to do the rewrites. This seems pretty half-baked to me. If the whole reason to compile stuff in is to make things simpler, why do we need external scripts?

  4. Building this as compile-time stuff requires things like VariadicFunctions.h, which (if generally useful) should be in LLVM, not clang. It is better to define away the problem though by not doing this stuff at compile time.

  5. Adding major new stuff like JSON parsing, etc. All of these (if they even make sense) should be independently reviewed and submitted, not taking as one mega patch.

Overall, this is exactly the sort of thing that happens when someone develops a large amount of code out of tree, without input from other contributors, and then tries to spring it on an open source project. While I really laud your goals and really want to push refactoring forward, this is not the right direction to start from. Trying to push a huge patch in isn’t the way to get to something that is truly great in the mainline tree.

-Chris

The library we implemented based on those findings allows a very declarative way of expressing interesting parts of the AST that you want to work on, with a callback mechanism to have an imperative part that does the actual transformation. The way this library uses templates to build up a reverse hierarchy of the clang AST allows us to expand the library of matchers without needing to manually copy the structure of the AST while still providing full compile time type checking.
We have used this library for many internal transformations (one more broadly applicable of them being included as an example tool), and Nico may be able to explain more how he’s using the infrastructure for Chromium.

My primary concern with this work is that it is built as a series of compile-time code constructs. This means that someone working on a rewriter needs to rebuild the clang executable to do this. Instead of doing this as a compile-time construct, have you considered building this as an extensible pattern matching tool where the pattern and replacement can be specified at runtime? I’m envisioning something like a “regular expression for an AST”. You don’t need to rebuild grep every time you want to search for something new in text.

Even in the case where compiled-in code is required, having a more dynamic matcher would greatly simplify the code doing the matching. Have you considered this approach?

Yes, we’ve considered this approach early on. We looked into both some Java and C-based solutions (see for example http://nighthacks.com/roller/jag/; for Java there are really really bad examples that match Java to XML and do xpath queries). The problem is that building that pattern matcher language would not be straight-forward (simply writing C++ with a few globs would not be enough for our current use cases, since C++ has a lot of implicit stuff going on which we want to match, and just creating an arbitrary new language doesn’t necessarily seem better than the in-language DSL).

I’m not suggesting that you parse “C++ with holes in it” as the pattern matching language. It would be perfectly acceptable to represent the patterns as S expressions for example. You currently use this sort of thing at compile time:

ConstructorCall(HasDeclaration(Method(HasName(StringConstructor))),
                ArgumentCountIs(2),...

This is basically already s expressions, you could either use this sort of thing unmodified if you think it is nice looking, or convert to proper s exprs, as in:

(constructorcall (hasdeclaration (method (hasname stringconstructor)))
(argumentcountis 2) …

The point is to make the matching language have really simple and trivial syntax, but syntax that is usable without rebuilding the compiler. There are tons of examples of tree pattern matches (including things like Burg) to draw inspiration from.

Considering that we want to eventually get a dynamic pattern matching language, but we also want to get it right, we are currently spending our time on the in-language DSL, and especially for the large scale stuff the developers we work with need surprisingly little help (the included example for replacing reduncant c_str() calls was created by a contributor who’s not worked on the implementation of the matcher library).

I’m not sure I get this logic. You’re saying that you’re afraid you won’t get the matching language right, so you’d avoiding it and doing something you know is wrong ;-). I expect much iteration on this, but all that requires is to tell people to expect breakage as you get experience with it and evolve things.

I don’t think our current code is “wrong”. I think it is useful for a certain amount of tools, and as the basis of more dynamic tools. The main problem with building a dynamic language is that it’s significantly more effort, as it will require to be a lot more complete before being useful. In the C++ world the user can always fall back to using the AST methods available.

And when we get to the higher-level refactoring tools, the dynamic aspect will be parameters to the refactoring, so the non-dynamic nature of the AST matchers does not matter for that case.

Well, the second step is to be able to specify rewrites dynamically the same way you specify predicates. In the case of a “real” refactoring engine, you’ll probably want the power of a scripting language or something to write your predicates in.

Again, a lot more effort, and I think we can base a more dynamic solution on what we currently develop.

When we look at the actual transformations, being in C++ again provides the benefit that we can just work with the AST nodes we matched instead of having to define some new way of dynamically specifying the transformations - and re-binding the AST in a dynamic language is definitely out-of-scope for us…

I see the convenience in this, but still think it is the wrong way to go.

In the end, I agree that the vision to have a really nice dynamic description of the matches is the ultimate goal, but for us this is currently still a few quarters out. The C++ code provides really useful abstractions to quickly describe matches and transformations on the AST, with little code (as we can use C++ to provide the type safety and thus the error checking on the AST nodes). The cost of a link-step while writing the tools has so far not been a big obstacle, especially considering that our main target users currently are a) C++ experts doing large scale code transformations and b) writing refactoring tools that end users can use without any knowledge about the AST.

Beyond being “the wrong way to go” IMHO, there are several other problems with the code as proposed:

  1. It doesn’t following the LLVM coding standards, particularly around naming, using std::map<std::string, using C headers like <assert.h>, and a bunch of other stuff.

I tried to make it llvm coding style conforming, and had Chandler and Zhanyong review it - I’d be happy to change whatever we missed.

  1. You’re building substantial new functionality into clang. The clang binary is already overly bloated with the static analyzer and other functionality that it keeps accreting . It would be better to use (and improve) the clang plugin mechanism to build this as a plugin. I’d also like the analyzer to move off to being a plugin as well. One carrot that we can give for people to build things as plugins is that they can use C++'0x features even though the clang compiler has to stay C++'98 for the forseeable future.

The point why I don’t want to run tools as a plugin is that the command line syntax for tools can be significantly different from running the compiler. I don’t think this needs to be linked into the clang binary though - as nothing in clang depends on it (confused…)

  1. The tooling infrastructure adds python stuff to do the rewrites. This seems pretty half-baked to me. If the whole reason to compile stuff in is to make things simpler, why do we need external scripts?

Yes, this is an intermediate step that will not be necessary soon. This is work in progress…

  1. Building this as compile-time stuff requires things like VariadicFunctions.h, which (if generally useful) should be in LLVM, not clang. It is better to define away the problem though by not doing this stuff at compile time.

  2. Adding major new stuff like JSON parsing, etc. All of these (if they even make sense) should be independently reviewed and submitted, not taking as one mega patch.

This was actually going in as multiple smaller patches. I just deleted everything at once, as it was all submitted with Doug saying that we should go ahead and he’ll review post-submit, but after you intervened I thought it doesn’t make sense to leave half of the changes in. Then I re-diffed the whole thing and put it in as one giant patch that would revert the delete.

Overall, this is exactly the sort of thing that happens when someone develops a large amount of code out of tree, without input from other contributors, and then tries to spring it on an open source project. While I really laud your goals and really want to push refactoring forward, this is not the right direction to start from. Trying to push a huge patch in isn’t the way to get to something that is truly great in the mainline tree.

I certainly don’t want to “spring it” on clang - and I’m sorry if I gave this impression. I really tried to make sure that each step I did was ok, trying to split it up into manageable pieces, confirming each check in with Doug, and trying to get feedback - again, I’m sorry if that was not the right approach. If you don’t think this is useful to host in clang, because you think it’s not the right approach, than this is completely your call.

I definitely think what we have is useful for developers who use clang, and as such I think the clang code base would be a good place to host it, especially since I’d really appreciate feedback from you guys while developing the code. But if you decide that we should host this code elsewhere I’m fine with that, too.

Cheers,
/Manuel

You can create a separate "driver" executable with it's own command line syntax for each tool.

Considering that we want to eventually get a dynamic pattern matching language, but we also want to get it right, we are currently spending our time on the in-language DSL, and especially for the large scale stuff the developers we work with need surprisingly little help (the included example for replacing reduncant c_str() calls was created by a contributor who’s not worked on the implementation of the matcher library).

I’m not sure I get this logic. You’re saying that you’re afraid you won’t get the matching language right, so you’d avoiding it and doing something you know is wrong ;-). I expect much iteration on this, but all that requires is to tell people to expect breakage as you get experience with it and evolve things.

The main problem with building a dynamic language is that it’s significantly more effort, as it will require to be a lot more complete before being useful.

While this is true, “doing the right thing is hard” isn’t a good excuse for doing the wrong thing :). I know this isn’t what you’re actually saying, but it’s better (to me at least) to try to understand the long term direction and then figure out incremental steps to get there. It’s generally easier to motivate things if you say that this is a step in the right direction.

I don’t think our current code is “wrong”. I think it is useful for a certain amount of tools, and as the basis of more dynamic tools.

Ok, I agree with you here. I think that the compile-time pattern matching could be very useful, for a broad variety of clients. Many warnings in the compiler are implemented by effectively pattern matching on ASTs, and it would be much more nice to use a pattern matching style API instead of a manual tree walk. These should clearly be statically compiled into the binary.

  1. You’re building substantial new functionality into clang. The clang binary is already overly bloated with the static analyzer and other functionality that it keeps accreting . It would be better to use (and improve) the clang plugin mechanism to build this as a plugin. I’d also like the analyzer to move off to being a plugin as well. One carrot that we can give for people to build things as plugins is that they can use C++'0x features even though the clang compiler has to stay C++'98 for the forseeable future.

The point why I don’t want to run tools as a plugin is that the command line syntax for tools can be significantly different from running the compiler. I don’t think this needs to be linked into the clang binary though - as nothing in clang depends on it (confused…)

Ok, I think that the root of the issue is that the patch just has too many different things going on. I’m still dubious about the tooling infrastructure, but the pattern matching functionality is clear goodness.

How about we start with just the AST pattern matching APIs, iterating on them until we get to something that can go in, with some warnings switched over to using it as an example. As a step towards this, please take a look at include/llvm/Support/PatternMatch.h, which does pattern matching on LLVM IR. I’d prefer the clang matching API to look more similar to it.

Thanks Manuel, again I really appreciate you pushing this functionality forward,

-Chris

It might also be worth looking at some of the papers from the Viewpoint Research Institute. They wrote an entire optimising compiler in OMeta (a pattern-matching DSL). ASTs are constructed by pattern matching on a character stream. ASTs are optimised by pattern matching on ASTs. Assembly is generated by pattern matching on ASTs. Optimised machine code is optimised again by patching on assembly. The entire compiler is under 500 lines of code, and is self-hosting (although, obviously, only implements a few optimisations).

David

-- Send from my Jacquard Loom

Considering that we want to eventually get a dynamic pattern matching language, but we also want to get it right, we are currently spending our time on the in-language DSL, and especially for the large scale stuff the developers we work with need surprisingly little help (the included example for replacing reduncant c_str() calls was created by a contributor who’s not worked on the implementation of the matcher library).

I’m not sure I get this logic. You’re saying that you’re afraid you won’t get the matching language right, so you’d avoiding it and doing something you know is wrong ;-). I expect much iteration on this, but all that requires is to tell people to expect breakage as you get experience with it and evolve things.

The main problem with building a dynamic language is that it’s significantly more effort, as it will require to be a lot more complete before being useful.

While this is true, “doing the right thing is hard” isn’t a good excuse for doing the wrong thing :). I know this isn’t what you’re actually saying, but it’s better (to me at least) to try to understand the long term direction and then figure out incremental steps to get there. It’s generally easier to motivate things if you say that this is a step in the right direction.

I completely agree.

I don’t think our current code is “wrong”. I think it is useful for a certain amount of tools, and as the basis of more dynamic tools.

Ok, I agree with you here. I think that the compile-time pattern matching could be very useful, for a broad variety of clients. Many warnings in the compiler are implemented by effectively pattern matching on ASTs, and it would be much more nice to use a pattern matching style API instead of a manual tree walk. These should clearly be statically compiled into the binary.

  1. You’re building substantial new functionality into clang. The clang binary is already overly bloated with the static analyzer and other functionality that it keeps accreting . It would be better to use (and improve) the clang plugin mechanism to build this as a plugin. I’d also like the analyzer to move off to being a plugin as well. One carrot that we can give for people to build things as plugins is that they can use C++'0x features even though the clang compiler has to stay C++'98 for the forseeable future.

The point why I don’t want to run tools as a plugin is that the command line syntax for tools can be significantly different from running the compiler. I don’t think this needs to be linked into the clang binary though - as nothing in clang depends on it (confused…)

Ok, I think that the root of the issue is that the patch just has too many different things going on. I’m still dubious about the tooling infrastructure, but the pattern matching functionality is clear goodness.

And I agree with you on the “too many things going on”. See below.

How about we start with just the AST pattern matching APIs, iterating on them until we get to something that can go in, with some warnings switched over to using it as an example. As a step towards this, please take a look at include/llvm/Support/PatternMatch.h, which does pattern matching on LLVM IR. I’d prefer the clang matching API to look more similar to it.

So, one of the interesting things is that part of the tooling makes it really easy to run stuff over a piece of code in memory, which all the testing for the AST matching stuff relies on.
This is why on April 7th I sent out a first patch with a first step in the tooling:
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110404/040694.html
Which I pinged on April 12th:
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110411/040826.html
especially asking for negative feedback if anybody thinks this is a stupid idea.
Since nobody answered until April 21st, I asked Doug in a private mail whether he’s fine with checking this in, to which he answered that I should go ahead and commit.
The rest of the commits since then went similarly, and were built on top of the other functionality already checked in. There were some bugfixes etc. on top of it.
So, that’s why I’d like to get the tooling stuff sorted out first. If we can implement the functionality I need from the tooling in a different way, I’m listening, but I’d like us to concentrate on that part first, as it makes all the other things a lot easier to do.

My biggest problem is probably that right now we have divergent work on our internal version, where we implement feature requests from our internal customers, and I’d really like do all that development in the open and go to a model where we backwards-integrate.
The delta is growing bigger over time, and I already spent the effort of splitting up the patch once, and since I also don’t think it makes sense to split the tests from the matcher code, I currently don’t have any free cycles to do more split-up work that doesn’t add value for us.

So, to make a suggestion: I’d propose we revert the revert. Then I’m happy to spend a portion of my time to massage the code into the end state you want it to be in, while being able to add the new features that I need to make my colleagues who are using the tools happy.

To the specific point about llvm/Support/PatternMatcher.h, I looked at this already when we designed the AST matchers, and I find it looks already very much like what we propose (apart from the inherent difference in the domain, and the naming) - perhaps I don’t understand enough about the IR to be able to tell what you would want to see changed there - can you give an example (perhaps we want to open a new thread on that?)

Thoughts?
/Manuel

Hi Manuel,

How about we start with just the AST pattern matching APIs, iterating on them until we get to something that can go in, with some warnings switched over to using it as an example. As a step towards this, please take a look at include/llvm/Support/PatternMatch.h, which does pattern matching on LLVM IR. I’d prefer the clang matching API to look more similar to it.

So, one of the interesting things is that part of the tooling makes it really easy to run stuff over a piece of code in memory, which all the testing for the AST matching stuff relies on.
This is why on April 7th I sent out a first patch with a first step in the tooling:
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110404/040694.html
Which I pinged on April 12th:
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110411/040826.html
especially asking for negative feedback if anybody thinks this is a stupid idea.
Since nobody answered until April 21st, I asked Doug in a private mail whether he’s fine with checking this in, to which he answered that I should go ahead and commit.

I completely agree with you that this was poorly handled, I’m honestly sorry for that.

The rest of the commits since then went similarly, and were built on top of the other functionality already checked in. There were some bugfixes etc. on top of it.
So, that’s why I’d like to get the tooling stuff sorted out first. If we can implement the functionality I need from the tooling in a different way, I’m listening, but I’d like us to concentrate on that part first, as it makes all the other things a lot easier to do.

My biggest problem is probably that right now we have divergent work on our internal version, where we implement feature requests from our internal customers, and I’d really like do all that development in the open and go to a model where we backwards-integrate.
The delta is growing bigger over time, and I already spent the effort of splitting up the patch once, and since I also don’t think it makes sense to split the tests from the matcher code, I currently don’t have any free cycles to do more split-up work that doesn’t add value for us.

I’ll say up front that this is completely a policy decision, not a technical one. I understand that you’re trying to solve an immediate problem and move on to new and more interesting challenges.

We’ve also had experience (e.g. the XML dumper) where code was contributed to the project with the promise that it would “get fixed later”… and later never came. As a project lead in an open source project, I only have power in two ways: 1) preventing code from going in and 2) asking that code be removed, so that it is correct when it goes in.

Allowing you to commit the code in a way that cannot be practically reviewed and whose design is already known to be questionable is asking to completely subvert our code review process, which is not acceptable. I’m sorry that this is causing pain for you, but unfortunately, this is how it has to be.

So, to make a suggestion: I’d propose we revert the revert. Then I’m happy to spend a portion of my time to massage the code into the end state you want it to be in, while being able to add the new features that I need to make my colleagues who are using the tools happy.

There are two major problems with this approach:

  1. you could get hit by a bus, distracted by your manager, or otherwise lose interest, thus never actually fixing the problem. I hope that this doesn’t happen, but stranger things have happened.

  2. this makes it impossible to understand the diff. As a hypothetical, lets say someone committed a patch that (unknown) had 10 problems in it. They continued hacking on the code, and managed to fix 8 of the problems. We would have lost the ability to know the entire surface area of the patch, thus making it impossible for code review to fix those 2 remaining issues.

This is why we have a policy that all code that goes in the tree be code reviewed by code owners (though it can happen before or after commit). Breaking our process because it is inconvenient and because the earlier reviews were mishandled unfortunately isn’t an option.

To the specific point about llvm/Support/PatternMatcher.h, I looked at this already when we designed the AST matchers, and I find it looks already very much like what we propose (apart from the inherent difference in the domain, and the naming) - perhaps I don’t understand enough about the IR to be able to tell what you would want to see changed there - can you give an example (perhaps we want to open a new thread on that?)

My primary concern is the naming issues.

-Chris