I have been looking at Decl::getASTContext and DeclContext::getParentASTContext,
and it turns out that they are not that cheap.
The reason for this is that they must walk back to the translation unit declaration
along what is essentially a linked list. This is amplified by the fact that it does
look like a simple getter and therefore is called extremely often (the most egregious
case by far being DeclContext::lookup), even when most of the times the context is
already easily available.
To put some numbers on this I experimented with removing about half of the iterations
in Decl::getTranslationUnitDecl. This gives a speed up of about 1% when parsing all
of Boost. Extrapolating from this I get that about 1.5%-2% of the time spent
parsing Boost is spent getting the ASTContext, which seems a bit unfortunate.
Now to be clear 1.5%-2% is not that much, but the fix is simply to pass a ref to
the ASTContext more often as a function parameter. The downside is that this would
cause a some amount of churn in the C++ api. To be clear I am *not* suggesting
removing all of the calls to Decl::getASTContext, but only removing the ones causing
the most iterations in Decl::getTranslationUnitDecl (say the top 10 or 20).
Given your data that perhaps 1.5-2% of the time building boost is finding the AST context, it definitely seems worthwhile to avoid doing so for lookup. The next three seem reasonable to address too, if you feel so inclined, but I’d not go any further than that – the long tail seems negligible in comparison.
Ah, I was hoping/looking for more of a timing/cpu-counter type profile, to demonstrate the overall impact (% runtime) of these changes, if possible?
The estimation I got is that about 1.5%-2% of the time spent doing an
fsyntax-only on all of Boost is spent obtaining the AST context.
This estimation was obtained by removing a fraction of the iterations in
Decl::getTranslationUnitDecl and extrapolating from this. The timings were
obtained by doing 20 iterations of fsyntax-only with perf stat both before
and after the changes.
This should probably be taken with a grain of salt since this is likely
dependent on the hardware and cpu usage, and since it is in general not
easy to reliably measure small timing differences.
Just to clarify, the 1.5%-2% figure is an estimation for the time spent obtaining the
AST context when parsing all of Boost without doing any code generation (ie: -fsyntax-only).
I generally agree about not bothering too much with the long tail, with perhaps two exceptions:
1.) Some of these are really easy to fix. For example FunctionDecl::setParams is almost only
called in Sema and so the context is already easily available.
2.) The numbers shown above are the number of calls to getTranslationUnitDecl. A better metric
would be the number of iterations needed to reach the translation unit declaration. With
that in mind it would also make sense to pass the AST context to some of the above where
the declaration is likely to occur nested deep in the declaration context chain
(VarDecls for example).
If there are no objections I will submit some patches next week to do this change, hopefully
in small increments. In any case this will require small (but straightforward) changes to at
least clang-tools-extra and lldb. (Am I forgetting any project here ?)
Given your data that perhaps 1.5-2% of the time building boost is finding the AST context, it definitely seems worthwhile to avoid doing so for lookup. The next three seem reasonable to address too, if you feel so inclined, but I’d not go any further than that – the long tail seems negligible in comparison.
Just to clarify, the 1.5%-2% figure is an estimation for the time spent obtaining the
AST context when parsing all of Boost without doing any code generation (ie: -fsyntax-only).
I generally agree about not bothering too much with the long tail, with perhaps two exceptions:
1.) Some of these are really easy to fix. For example FunctionDecl::setParams is almost only
called in Sema and so the context is already easily available.
Sure, non-disruptive cleanups for this seem good where possible. (But for example, getAttrs is called from a large number of places and it’d be really awkward to require an ASTContext, so I’d leave that one.)
2.) The numbers shown above are the number of calls to getTranslationUnitDecl. A better metric
would be the number of iterations needed to reach the translation unit declaration. With
that in mind it would also make sense to pass the AST context to some of the above where
the declaration is likely to occur nested deep in the declaration context chain
(VarDecls for example).
If there are no objections I will submit some patches next week to do this change, hopefully
in small increments. In any case this will require small (but straightforward) changes to at
least clang-tools-extra and lldb. (Am I forgetting any project here ?)
Do I understand it right that you’d like to use a local DeclContext* variable and make sure it’s identical to S.getASTContext() result (or similar for Parent)?
What do you think about caching ASTContext and/or ParentASTContext in Sema?
I don't understand the part about caching the AST context in Sema since Sema already store
a reference to the AST context.
What I would like to propose is the pass a reference to the AST context more often as
a function parameter. For example instead of having "lookup_result lookup(DeclarationName Name) const;"
we would have "lookup_result lookup(ASTContext &Context, DeclarationName Name) const;" (with ASTContext
possibly const-qualified).
This is a little bit annoying since it require updating all the callers and propagating the changes
along the call chains. I would only suggest doing this when the changes are not too invasive.
For example Decl::getAttrs is used in a lot of places and requiring the AST context would probably
not be a good idea. (although possibly getAttrs could have two versions, one with a reference to the
AST context and another one without).
The other alternatives I considered are:
1.) Store a reference to the AST context in each DeclContext. This has in my opinion
an unacceptable memory cost. In addition when I experimented with this the speedup
obtained from avoiding the lookup of the AST context are almost entirely negated by
the increased memory usage (this might of course vary depending on the specific test setup).
2.) Cache the AST context in Decl::getASTContext. I did not experiment with this but it seems to
me that this will not work when there are multiple AST contexts. Or if it can be made to work
there is a need to store some data in Decl for the invalidation, but Decl has not a single bit
available.