Moving to cfe-dev, as I think the design discussion is misplaced on a
review thread for one added feature
It seems like it would almost make more sense to move these great doc comments with great examples of source constructs that the AST node corresponds to into the documentation for the AST nodes themselves, and then have single header (say, "StmtMatchers.h") with a block comment at the top saying something to the tune of
This obviously depends a lot on the answers to your later points, but
under the condition that we actually want those matchers spelled out
somewhere, I think having the documentation there is good, even if it
might seem redundant. Requiring users to go through more layers of
indirection for digging up files seems worse than the maintenance
"each `Stmt` subclass has a corresponding matcher here, with a name which is the name of the AST node with the first letter lowercase rather than upper, so for example `MaterializeTemporaryExpr`'s corresponding matcher is `materializeTemporaryExpr`. If you are unsure about what a given `Stmt` subclass corresponds to at the source level, each corresponding `Stmt` subclass's doxygen comment contains a number of examples which should help you get a feel for what the AST node corresponds to at the source level."
And then you could just stamp out all of them. I'm not sure how you're determining when to add exact AST node matchers, but I can't see a compelling reason to not just add all of them at once (well, besides the ever-present lack of time :). Of course, all of what I said for `Stmt`'s could just as easily be said for `Decl`'s or other AST nodes.
Maybe you could make it even more uniform from an API point of view and have a matcher `astNode<T>` where `T` is any AST node and which matches that node exactly, so the `materializeTemporaryExpr` you have in this patch could be written as `astNode<MaterializeTemporaryExpr>`. I think that from an API point of view this is really clean, since then the user only has to learn one matcher in order to match any exact AST node they want (this also seems like a huge win from a documentation perspective as well, since you now only need to document one matcher in order to cover all exact AST node matchers).
I understand that the `astNode<T>` is a bit more verbose, but the clarity and API transparency that it affords is I think //much// more important, since you can recover the syntax sugar trivially with aliases. For example, suppose that I just want to match records; clang has an AST node for that, which is `RecordDecl`, so I know I just have to do `astNode<RecordDecl>` and that's it. However, with the current design, I need to wade through docs to find it (it doesn't seem to exist after wading and grepping), or worse be deceived and use `record()`, which doesn't match `RecordDecl`'s, but actually `CXXRecordDecl`'s!
Is there something that I'm missing that complicates doing it that way? Could you compare/contrast your current approach with the one I just described?
In a sense, I guess what my radar is picking up on here is that you have some "boilerplate" AST matchers here which correspond exactly with AST nodes, while other ones don't, like allOf(), yet this distinction doesn't appear to be being called out in documentation or well-factored within the code itself (or documentation). Also, it seems like a documentation Single Point Of Truth violation to have these great examples of what the AST node corresponds to here and not on the actual AST node---these examples would probably also be of interest to a person reading documentation of the AST node. Having the examples here makes perfect sense for matchers which are not matching exact AST nodes, though.
So, you raise a very good point, and we have had that exact argument
multiple times over the course of the AST matchers (the last time was
I think when Daniel brought it up
It is far from a clear cut "obvious" design choice. As you have said,
there are costs and benefits to both approaches, and in the end the
question is the value you put on each of those points.
I have a few comments on the points you raise (and why I might value
them differently than you do):
- First, we have a very simple of of specifying the node matchers,
which is very close to your aliasing: the VariadicDynCastAllOf thingy.
I contemplated renaming that to NodeMatcher to make that more clear.
- The way to write a matcher would actually be astNode<Stmt,
, because afaik we cannot statically figure
out what our base type is
- if you allow users to alias, but don't provide a central location
where to put those aliases, we'll end up with matcher expressions that
are harder to read and maintain, as users will provide their own names
- having node matchers for nodes for which none of the other matchers
exist, is pretty useless - thus, I doubt that we'll add matchers for
all of the 500+ classes of the AST
When it come to documentation I'm also personally not a big believer
to single point of truth, as context, presentation and repetition
matters for learning.
That said, I'll soon go over the matchers and rename them to exactly
match the AST nodes, which gives another good argument
pro-orthogonality. I still don't think it quite sways the balance, but
I'm open to more opinions