Ok, I now understand that I phrased my question a bit too broad. I do agree that for a compiler, invalid AST nodes are not interesting, and should not be retained. Also, in C++ with templates, one has to be really careful about retaining invalid nodes. Also, clang does already do a great job with fix-its, including the one catching misspelled identifiers.
So I will narrow it down to the use-cases I am currently interested in, and that is for the use in an IDE. More specific, for files that are opened, possibly even unsaved. (So not for indexing, but in cases where a (lib)clang client could easily pass a flag.) And yes, quite a bit of that is incorrect code, maybe because it's not finished yet, or because of a typo, or possibly some misunderstanding or another human error.
1. Highlighting, finding declarations, definitions and/or usages: most of the "surprises" here seem to be caused by the choice to drop whole expressions when parts are invalid. For example, if an lhs in a binary expression is invalid, the rhs gets dropped too. Same for unary expression operators, and parameter lists in call expressions. With the exception of parameter lists, I don't think that the memory foot-print would increase too much if only the invalid piece of the AST would be dropped. Apart from the possibly visual aspect, an won't be able to help with finding back things like (defined but invalid) parameters, incompatible rhsses, etc.
2. Probably a specific case of the above: if, in C++, an expression consists of only a function call, and that function is undeclared and undefined, all information about it gets dropped. If an IDE would want to offer a declare-/define-function fix-it, it cannot get back to any information about the call.
3. Invalid method/function calls: first of all, overloaded methods seem to do fine. But if a method is not declared/defined, and is accessed through a member access operator or a qualifier, then the valid parts of the member access and the valid parts of the qualifier get dropped. So the same as for item 2, offering an "implement it" is, well, tedious. Also, finding the definition of the qualifier/lhs is also impossible, so a user cannot "jump there to see what is available".
4. If a C++ method is defined without being declared, the qualifier gets dropped (!). Implementing an insert-declaration-from-definition fix-it would require some re-parsing and quite some very fuzzy logic.
5. Not fully verified, but it looks like clang drops at least some information when a method definition does not match up with any declaration, or vice-versa. Apart from the previous item/case, this can happen when the parameter list of a method gets changed. Now you could say that changing the method/function signature is a proper refactoring action, most people "just do it and fix the other afterwards". I admit that this one is very, erm, fuzy, but it happens so often that if an IDE can help out here, it is considered as a "cool thing".
A note about templates: all of the items mentioned above are without any templates involved. I know that templates do complicate things. But when talking with users, the general consensus seems to be that if templates cannot be handled for the cases mentioned above, it still covers more than three-quarters of the use-cases. And most seem to understand the complexity when templates are involved.
For anything else than item #1, I think that some flag is needed in Stmt, which indicates that, although a Stmt is invalid and utterly useless for compilation, it might still hold some useful information for (lib)clang clients although the node is erroneous and should not be used for code generation or proper refactoring.
So, again the question: feedback?
Where did I miss details, or are these too narrow use-cases to be useful to cover with clang as a front-end?
-- Erik.