Avoiding unnecessary calls to Decl::getASTContext

Hi all,

I have been looking at Decl::getASTContext and DeclContext::getParentASTContext,
and it turns out that they are not that cheap.

The reason for this is that they must walk back to the translation unit declaration
along what is essentially a linked list. This is amplified by the fact that it does
look like a simple getter and therefore is called extremely often (the most egregious
case by far being DeclContext::lookup), even when most of the times the context is
already easily available.

To put some numbers on this I experimented with removing about half of the iterations
in Decl::getTranslationUnitDecl. This gives a speed up of about 1% when parsing all
of Boost. Extrapolating from this I get that about 1.5%-2% of the time spent
parsing Boost is spent getting the ASTContext, which seems a bit unfortunate.

Now to be clear 1.5%-2% is not that much, but the fix is simply to pass a ref to
the ASTContext more often as a function parameter. The downside is that this would
cause a some amount of churn in the C++ api. To be clear I am *not* suggesting
removing all of the calls to Decl::getASTContext, but only removing the ones causing
the most iterations in Decl::getTranslationUnitDecl (say the top 10 or 20).

What do you think ?

Bruno

Seems reasonable and some profile data would help motivate/support/justify which bits are worth changing and which aren’t

I have added the profile data to the first patch (⚙ D55658 Avoid Decl::getASTContext in hot paths where possible, Part 1)

The top 20 callers of getASTContext and getParentASTContext are:
(with the number of calls to getTranslationUnitDecl, edited for readability)

5399586 : DeclContext::lookup
801684 : VarDecl::getMemberSpecializationInfo
551928 : FunctionDecl::getMinRequiredArguments
401464 : CXXRecordDecl::lookupInBases
338614 : Decl::getAttrs
311651 : DeclContext::makeDeclVisibleInContextImpl
173684 : Decl::getASTMutationListener
164264 : CXXConstructorDecl::isSpecializationCopyingObject
154066 : CXXMethodDecl::size_overridden_methods
141622 : CXXRecordDecl::getVisibleConversionFunctions
120440 : ClassTemplateSpecializationDecl::Profile
116303 : VarDecl::isThisDeclarationADefinition
106208 : CXXRecordDecl::conversion_begin
106208 : CXXRecordDecl::conversion_end
99075 : FunctionDecl::setParams
95910 : RedeclarableTemplateDecl::findSpecializationImpl
57043 : CXXRecordDecl::getDestructor
56668 : VarDecl::getDefinition
52140 : TagDecl::startDefinition
48266 : CXXConstructorDecl::isCopyOrMoveConstructor
37407 : FunctionDecl::getBody

Bruno

Ah, I was hoping/looking for more of a timing/cpu-counter type profile, to demonstrate the overall impact (% runtime) of these changes, if possible?

I have added the profile data to the first patch (https://reviews.llvm.org/D55658)

The top 20 callers of getASTContext and getParentASTContext are:
(with the number of calls to getTranslationUnitDecl, edited for readability)

5399586 : DeclContext::lookup
801684 : VarDecl::getMemberSpecializationInfo
551928 : FunctionDecl::getMinRequiredArguments
401464 : CXXRecordDecl::lookupInBases
338614 : Decl::getAttrs
311651 : DeclContext::makeDeclVisibleInContextImpl
173684 : Decl::getASTMutationListener
164264 : CXXConstructorDecl::isSpecializationCopyingObject
154066 : CXXMethodDecl::size_overridden_methods
141622 : CXXRecordDecl::getVisibleConversionFunctions
120440 : ClassTemplateSpecializationDecl::Profile
116303 : VarDecl::isThisDeclarationADefinition
106208 : CXXRecordDecl::conversion_begin
106208 : CXXRecordDecl::conversion_end
99075 : FunctionDecl::setParams
95910 : RedeclarableTemplateDecl::findSpecializationImpl
57043 : CXXRecordDecl::getDestructor
56668 : VarDecl::getDefinition
52140 : TagDecl::startDefinition
48266 : CXXConstructorDecl::isCopyOrMoveConstructor
37407 : FunctionDecl::getBody

Given your data that perhaps 1.5-2% of the time building boost is finding the AST context, it definitely seems worthwhile to avoid doing so for lookup. The next three seem reasonable to address too, if you feel so inclined, but I’d not go any further than that – the long tail seems negligible in comparison.

Ah, I was hoping/looking for more of a timing/cpu-counter type profile, to demonstrate the overall impact (% runtime) of these changes, if possible?

The estimation I got is that about 1.5%-2% of the time spent doing an
fsyntax-only on all of Boost is spent obtaining the AST context.

This estimation was obtained by removing a fraction of the iterations in
Decl::getTranslationUnitDecl and extrapolating from this. The timings were
obtained by doing 20 iterations of fsyntax-only with perf stat both before
and after the changes.

This should probably be taken with a grain of salt since this is likely
dependent on the hardware and cpu usage, and since it is in general not
easy to reliably measure small timing differences.

Bruno

Just to clarify, the 1.5%-2% figure is an estimation for the time spent obtaining the
AST context when parsing all of Boost without doing any code generation (ie: -fsyntax-only).

I generally agree about not bothering too much with the long tail, with perhaps two exceptions:

1.) Some of these are really easy to fix. For example FunctionDecl::setParams is almost only
    called in Sema and so the context is already easily available.

2.) The numbers shown above are the number of calls to getTranslationUnitDecl. A better metric
    would be the number of iterations needed to reach the translation unit declaration. With
    that in mind it would also make sense to pass the AST context to some of the above where
    the declaration is likely to occur nested deep in the declaration context chain
    (VarDecls for example).

If there are no objections I will submit some patches next week to do this change, hopefully
in small increments. In any case this will require small (but straightforward) changes to at
least clang-tools-extra and lldb. (Am I forgetting any project here ?)

Bruno

I have added the profile data to the first patch (https://reviews.llvm.org/D55658)

The top 20 callers of getASTContext and getParentASTContext are:
(with the number of calls to getTranslationUnitDecl, edited for readability)

5399586 : DeclContext::lookup
801684 : VarDecl::getMemberSpecializationInfo
551928 : FunctionDecl::getMinRequiredArguments
401464 : CXXRecordDecl::lookupInBases
338614 : Decl::getAttrs
311651 : DeclContext::makeDeclVisibleInContextImpl
173684 : Decl::getASTMutationListener
164264 : CXXConstructorDecl::isSpecializationCopyingObject
154066 : CXXMethodDecl::size_overridden_methods
141622 : CXXRecordDecl::getVisibleConversionFunctions
120440 : ClassTemplateSpecializationDecl::Profile
116303 : VarDecl::isThisDeclarationADefinition
106208 : CXXRecordDecl::conversion_begin
106208 : CXXRecordDecl::conversion_end
99075 : FunctionDecl::setParams
95910 : RedeclarableTemplateDecl::findSpecializationImpl
57043 : CXXRecordDecl::getDestructor
56668 : VarDecl::getDefinition
52140 : TagDecl::startDefinition
48266 : CXXConstructorDecl::isCopyOrMoveConstructor
37407 : FunctionDecl::getBody

Given your data that perhaps 1.5-2% of the time building boost is finding the AST context, it definitely seems worthwhile to avoid doing so for lookup. The next three seem reasonable to address too, if you feel so inclined, but I’d not go any further than that – the long tail seems negligible in comparison.

Just to clarify, the 1.5%-2% figure is an estimation for the time spent obtaining the
AST context when parsing all of Boost without doing any code generation (ie: -fsyntax-only).

I generally agree about not bothering too much with the long tail, with perhaps two exceptions:

1.) Some of these are really easy to fix. For example FunctionDecl::setParams is almost only
called in Sema and so the context is already easily available.

Sure, non-disruptive cleanups for this seem good where possible. (But for example, getAttrs is called from a large number of places and it’d be really awkward to require an ASTContext, so I’d leave that one.)

2.) The numbers shown above are the number of calls to getTranslationUnitDecl. A better metric
would be the number of iterations needed to reach the translation unit declaration. With
that in mind it would also make sense to pass the AST context to some of the above where
the declaration is likely to occur nested deep in the declaration context chain
(VarDecls for example).

If there are no objections I will submit some patches next week to do this change, hopefully
in small increments. In any case this will require small (but straightforward) changes to at
least clang-tools-extra and lldb. (Am I forgetting any project here ?)

I think it’s just those two. Thank you!

Hi,

Sorry for being late to the discussion.

Do I understand it right that you’d like to use a local DeclContext* variable and make sure it’s identical to S.getASTContext() result (or similar for Parent)?

What do you think about caching ASTContext and/or ParentASTContext in Sema?

Jan

I don't understand the part about caching the AST context in Sema since Sema already store
a reference to the AST context.

What I would like to propose is the pass a reference to the AST context more often as
a function parameter. For example instead of having "lookup_result lookup(DeclarationName Name) const;"
we would have "lookup_result lookup(ASTContext &Context, DeclarationName Name) const;" (with ASTContext
possibly const-qualified).

This is a little bit annoying since it require updating all the callers and propagating the changes
along the call chains. I would only suggest doing this when the changes are not too invasive.
For example Decl::getAttrs is used in a lot of places and requiring the AST context would probably
not be a good idea. (although possibly getAttrs could have two versions, one with a reference to the
AST context and another one without).

The other alternatives I considered are:

1.) Store a reference to the AST context in each DeclContext. This has in my opinion
    an unacceptable memory cost. In addition when I experimented with this the speedup
    obtained from avoiding the lookup of the AST context are almost entirely negated by
    the increased memory usage (this might of course vary depending on the specific test setup).

2.) Cache the AST context in Decl::getASTContext. I did not experiment with this but it seems to
    me that this will not work when there are multiple AST contexts. Or if it can be made to work
    there is a need to store some data in Decl for the invalidation, but Decl has not a single bit
    available.

Bruno

Sorry, my bad. I was just thinking whether there’s any more fool-proof way than manual passing of DeclContext but my thoughts derailed.