[RFC] A C++ pseudo parser for tooling

Trying to unify the parsers will be a huge burden on clang parser contributors.

Agreed!

I don’t think this is likely.
There’s no switch over all kinds. There’s a grammar, a new TokenKind is (automatically) a new terminal that doesn’t match any grammar rules - it gets sucked up by error recovery.
c.f. no change to Format/ here.
Breaking changes would be removal of used token kinds, these are vanishingly rare.

Okay, that’s good to know. It doesn’t match my experience with the Clang parser (adding tokens has definitely caused fully-covered switch diagnostics for me in the past), but so long as the new parser is designed with this problem in mind, I think it should be fine. I agree that removing token kinds is quite rare and not a burden I’m worried about.

Not really. I’m expecting this to be maintained by a different (possibly overlapping) set of people. It wouldn’t be appropriate to make changes to both in the same commit, nor to gate approval of one on willingness to work on the other. If you’re working on a coroutines and want to add it to clang, define matchers, add clang-format support, and implement clangd then that’s great, but there’s no expectation you should.

Okay, good to know, that also makes me more comfortable.

This seems like an argument for decoupling: if there’s not enough interest to keep a peripheral feature up-to-date for its own sake, do we want it to fall behind, or put it on the critical path for clang changes, or rip it out immediately if it’s hard to keep in lockstep?

Oh, yes, definitely an argument for decoupling. Sorry if I was unclear on that; my hope was that this parser would be as fully decoupled as possible (aka, not in the tree at all), not that we would find a way to mash the existing parser into working for the RFC needs as well as typical Clang compilation needs.

(I’m coming more and more around to the idea of “rip it out immediately” btw; Clang and LLVM are extremely large projects that are amazingly slow to compile and require an inordinate amount of resources. Removing unsupported functionality and not adding unmotivated tools goes a long ways towards improving this problem. And I think we definitely need to improve this problem – hard for us to have a particularly inclusive community when the only people who can build the project require a build farm to do so, which is basically the point we’re almost at today.)

(It’s certainly also a good argument not to accept marginal features!)

+∞

We did consider this option, but the developer policy specifically requests the opposite

Yes and no. Once we know we want something in-tree, then we definitely don’t want it to be dropped on us in one huge code dump. But if we don’t know we want something in-tree yet, keeping it out of tree to prove the concept out is far less disruptive than doing experimental work in-tree. The waters are muddy here because of clangd, which we already said we wanted in-tree (but it wasn’t clear to me when we said yes to that it meant we’d be duplicating the entire parser).

It’s not useful as an optional dependency - we’d be better to either move clangd out of tree or drop the whole thing.

That’s a fair perspective.

I’m feeling a bit less worried about the burdens this will add to the Clang community than I was before, but I’m still not particularly in love with the RFC. However, if the pseudo parser lives in clang-tools-extra (which is where clangd lives), I think it’s probably reasonable. That said, please try to keep build times in mind when working on the pseudo parser as best you can (e.g., be careful/thoughtful about adding tablegen or other dependencies that are known to hurt build times). So, I consider my “no go” issues to be addressed and am in (reluctant) support of this RFC.

1 Like