ASTContext::getParents discussion

Hi all,

while developing a custom checker I encountered the following issue related to the getParents member function of the ASTContext.

My checker was using ACtx->getParents in a callback and I noticed that sometimes parents were missing. After some debugging it turned out that this occurs when:

- Using the Clang Static Analyzer with Cross Translation Unit (CTU) analysis
- Checker callback is called once, getParent builds a Parent Map to cache this CPU heavy task
- The CTU uses the ASTImporter to import a function defined in another TU, this modifies the AST, introducing new parents.
- Checker callback is called again, now getParents will use the old Parent Map

My fix was to introduce ASTContext::invalidateParents which would be called by the ASTImporter on every AST mutation. That way, the Parent Map would only have to be rebuilt if it was invalidated before a getParents call. This is the patch: https://reviews.llvm.org/D46940

Richard expressed some concerns about this patch, to summarize (full: https://reviews.llvm.org/D46940#1147456 ):

- Other entities besides the ASTImporter could mutate the AST, so the responsibility of invalidation should lie with the consumer of the Parent Map
- The Parent Map should not be in the ASTContext at all, really in libTooling

While I understand these concerns, a solution like that does not solve my use-case described above. If I create a new Parent Map on every callback invocation, my checker becomes unusably slow. And if I cache it, I do not know when it is invalidated.

Another idea I had was to introduce invalidation callbacks to the Checker, Analyzer and ASTImporter interfaces, in order to carry the invalidation trigger up to a user defined checker. This doesn't seem very elegant though.

I wanted to start a discussion to find a solution to this. Do you have any ideas or input?

-Rafael

Hmm, i guess that generally it is a problem, if your algorithm mutates the AST, you'll have to deal with stale parent maps somehow and do sophisticated caching, and it's likely that nobody has ever developed a reliable solution to this.

But i don't immediately understand your specific use case. What sort of AST nodes does your checker check? If it's statements within a function body, why do they get mutated when an import of an unrelated function occurs? Are you using the per-function parent maps (such as the ones that live in AnalysisDeclContext), or are you trying to construct a single ParentMap for the whole AST? If you're checking declarations, why aren't you using DeclContext chains instead of parent maps?

Generally, are you using CTU in a manner that's more sophisticated than what normal path-sensitive CTU does? I.e., not only bring in function bodies, but also inject arbitrary stuff into arbitrary stuff?

Thanks for your comments!

Yes I understand that it would be my responsibility to manage a Parent Map if I were to mutate the AST, but in that case I'm just a checker user and the CTU mutates the AST via the ASTImporter during analysis.

The use-case is that I just classify a more complex expression, where the analyzer callback argument is only a child node. To walk upwards I use ASTContext::getParents, which builds the Parent Map for the entire TU. When after that a new function gets imported the map will be missing the Stmts in that function.

A good suggestion from you is to use function-scoped parent maps or the ones from AnalysisDeclContext. This seems to solve my issue, thanks!

In that case I will drop the patch, because of a misuse of ASTContext::getParents. Still, you have to know all this and for some checker user it might not even be obvious that CTU mutates the AST. So I would agree that getParents should probably be moved out of ASTContext. Would likely break a lot of user code though.

-Rafael

Thanks for your comments!

Yes I understand that it would be my responsibility to manage a Parent
Map if I were to mutate the AST, but in that case I'm just a checker
user and the CTU mutates the AST via the ASTImporter during analysis.

The use-case is that I just classify a more complex expression, where
the analyzer callback argument is only a child node. To walk upwards I
use ASTContext::getParents, which builds the Parent Map for the entire
TU. When after that a new function gets imported the map will be missing
the Stmts in that function.

A good suggestion from you is to use function-scoped parent maps or the
ones from AnalysisDeclContext. This seems to solve my issue, thanks!

In that case I will drop the patch, because of a misuse of
ASTContext::getParents. Still, you have to know all this and for some
checker user it might not even be obvious that CTU mutates the AST. So I
would agree that getParents should probably be moved out of ASTContext.

This would be my preference. The parent map is not part of the AST
model, so should not be owned by the ASTContext; moving the
responsibility for creating it and invalidating it on AST mutation
elsewhere would make the semantics clearer. (ASTContext should
maintain its own invariants; if it really owns a parent map, that map
should be invalidated automatically on AST mutations. But we don't
want to do that.) This change would also improve Clang's layering; the
parent map probably belongs in a layer on top of AST that's shared by
the static analyzer and tooling libraries.