RFC: Easier AST Matching by Default

I am not a user of AST matchers, and am somewhat new (1 1/2 years) to clang, and so I won’t weigh in on the specifics here, but there are some good general points being discussion here as well and I’d like to weigh in on those, without derailing the specific discussion.

The proposal has been implying (correct me if I'm wrong) that the
maintenance cost and complexity that is being put onto experts with a
codebase that uses AST matchers that they maintain long-term is worth the
added convenience for novice users who write one-off tools that don't need
to be maintained. If it is indeed the case, I would prefer this argument to
be made explicitly (as in, the proposal should describe the downsides of
making this change; right now it does not describe them).


I favor that argument. The interests of novices deserve many times more weight than those of the experts. When it takes X years to learn how to use something effectively, that’s X less years of productive contributions per person. Such costs accumulate and degrade long-term progress mostly beneath consciousness, and so need to be fought vigorously wherever they are recognized.

On the other hand, adding more code to the headers of a codebase usually does not serve the interest of novices — usually the opposite is true; it just gives more options they must understand and weigh. In this case specifically, I’m not sure that presenting a veneer of the AST that omits the implicit nodes will really shorten the learning curve. It might, but it’s not certain. I defer to others.

But what would certainly shorten the learning curve, and benefit everyone long-term in my view: a concentrated effort to clean up and improve the documentation of the AST. Just giving clearer names and documentation to some of the implicit nodes, and perhaps taking a hard second look at some of the design choices, would benefit everyone. If done in one concentrated blast, the code rewriting costs of long-time users would be mitigated, and would be made up for in the longer-term via streamlined use.

Even if there is not sufficient desire or will to alter names and design choices, the documentation at least could be much improved, at no such additional cost to existing users. The documentation in the clang AST headers is very uneven. Some of it is great and clear. In other places it is like this:

Documentation for TypeLoc:
"Base wrapper for a particular 'section' of type source info".

Documentation for TypeSourceInfo:
"A container of type source information.
A client can read the relevant info using TypeLoc wrappers."

What role do those play next to Type and QualType? Beats me, would love to see that in the documentation rather than having to scroll through files trying to manually figure it out. You can hide all of the implicit nodes and novices will still be as confused as ever by some of the explicit ones.