Expression Templates for AST matchers

OCLint, a lint checker using clang tooling, says:
<http://docs.oclint.org/en/dev/internals/rules.html&gt;

    "Using AST matchers has more restrictions than AST visitor, and it
    takes much longer time in analysis, this could lower the
    performance of the tool. So, we always have to consider the
    trade-off, and choose wisely."

It seems that AST matchers, while convenient, are not the best
performing option for processing source files. I've done some simplistic
measurements on clang-tidy and found that the majority of the execution
time is spent in the matchers. I believe this is the result of the
matchers building runtime structures to match nodes.

Has anyone investigated using the technique of expression templates to
get the convenience of an AST matching domain-specific language (DSL)
with the runtime efficiency of an AST visitor?

If we were to pursue such a direction, would it be possible to use an
expression template library like those in Boost for the implementation?
I don't see any existing use of Boost in the LLVM/clang code base, so
I infer that there is some avoidance of Boost for whatever reason. I
haven't done any expression template coding beyond using existing
expression template based libraries like Boost.Spirit. I have no idea
what it would take to create an expression template based AST matcher
DSL for clang, but it seems like it would be worthwhile as it would
speed up all the tools that want to use the AST matcher style
interface.

Does anyone have any thoughts about this? Pehraps Manuel Klimek?

Although I’m mostly an armchair spectator right now w.r.t. clang tooling stuff, I think that use cases like clang-query and in general the dynamic AST matchers suggests that expression templates aren’t a very good solution because they will require reimplementing things from scratch for the “dynamic” case.

Regardless, I don’t think that the primary bottleneck is not doing enough at compile time (of the matchers). Rather we aren’t effectively indexing the underlying data set we are trying to query (right now, we’re more like grep and less like google). Indexing starts paying off once you have to do multiple queries* and can amortize the indexing cost across all those queries. I think most uses for clang-tidy have enough ast matching in them for indexing to pay off. Note that indexing doesn’t have to return results immediately; it just needs to prune the search space enough to pay off (a good example is using a trigram index to prune the search space for regexp matching http://swtch.com/~rsc/regexp/regexp4.html).

  • “multiple” here also scales numerically with the number of TU’s. An index can avoid visiting certain TU’s at all in many cases.

– Sean Silva

We actually went from a more expression templatey to a less expression templatey design. Sam has worked on making clang tidy use cases faster. I expect in the end the right thing is to do less duplicate work which I think we will be able to do with modules.

In article <CAOsfVvmev4Y7Q=Wr=8RJQFs49S94J0OE97-iJyaWRZVvCD60Jg@mail.gmail.com>,
    Manuel Klimek <klimek@google.com> writes:

We actually went from a more expression templatey to a less expression
templatey design.

Interesting! So far, I feel like what I've been trying to match is
completely expressable at compile-time and deferring things to runtime
is making things slow. This seems to be consistent with the findings
of the imlpementors of OCLint and why they make the statement I quoted
earlier.

> Although I'm mostly an armchair spectator right now w.r.t. clang tooling
> stuff, I think that use cases like clang-query and in general the dynamic
> AST matchers suggests that expression templates aren't a very good solution
> because they will require reimplementing things from scratch for the
> "dynamic" case.

When I look at a library like Boost.Spirit that heavily uses
expression templates I don't see how I'm losing the dynamic case at
all. Maybe I'm misunderstanding what you're saying.

Sam has worked on making clang tidy use cases faster. I
expect in the end the right thing is to do less duplicate work which I
think we will be able to do with modules.

While modules are nice, I don't think it's reasonable to expect that
modules will be deployed over a significant chunk of the world's C++
code anytime soon. So it doesn't seem practical to say "modules will
fix this".

> Regardless, I don't think that the primary bottleneck is not doing enough
> at compile time (of the matchers). Rather we aren't effectively indexing
> the underlying data set we are trying to query (right now, we're more like
> grep and less like google). [...] (a
> good example is using a trigram index to prune the search space for regexp
> matching Regular Expression Matching with a Trigram Index).
>
> * "multiple" here also scales numerically with the number of TU's. An
> index can avoid visiting certain TU's at all in many cases.

Thanks for that link Sean, that was an interesting read. I couldn't
find it "live" on the web, but was able to get it via the wayback
machine on archive.org.

In article <CAOsfVvmev4Y7Q=Wr=8RJQFs49S94J0OE97-iJyaWRZVvCD60Jg@mail.gmail.com>,
Manuel Klimek <klimek@google.com> writes:

We actually went from a more expression templatey to a less expression
templatey design.

Interesting! So far, I feel like what I’ve been trying to match is
completely expressable at compile-time and deferring things to runtime
is making things slow. This seems to be consistent with the findings
of the imlpementors of OCLint and why they make the statement I quoted
earlier.

Although I’m mostly an armchair spectator right now w.r.t. clang tooling
stuff, I think that use cases like clang-query and in general the dynamic
AST matchers suggests that expression templates aren’t a very good solution
because they will require reimplementing things from scratch for the
“dynamic” case.

When I look at a library like Boost.Spirit that heavily uses
expression templates I don’t see how I’m losing the dynamic case at
all. Maybe I’m misunderstanding what you’re saying.

Sam has worked on making clang tidy use cases faster. I
expect in the end the right thing is to do less duplicate work which I
think we will be able to do with modules.

While modules are nice, I don’t think it’s reasonable to expect that
modules will be deployed over a significant chunk of the world’s C++
code anytime soon. So it doesn’t seem practical to say “modules will
fix this”.

Well, it’s a matter of prioritization.

  1. expression templates will make the code significantly more complex and harder to maintain
  2. we need benchmark results to show that they’ll actually get something
  3. we need to have enough engineering resources for the future to maintain the library

If you’re willing to spend the time to prove that expression templates actually help real world benchmarks here without too much complexity, and if you can convince the current maintainers of the library that there will be enough hands on deck to support the implementation going forward, I’m happy to review patches :slight_smile: (I’d say you mainly have to convince Sam, who has done most of the core changes lately)

Other than that it’s a priorities game.

In article <CAOsfVvmev4Y7Q=Wr=
8RJQFs49S94J0OE97-iJyaWRZVvCD60Jg@mail.gmail.com>,
    Manuel Klimek <klimek@google.com> writes:

> We actually went from a more expression templatey to a less expression
> templatey design.

Interesting! So far, I feel like what I've been trying to match is
completely expressable at compile-time and deferring things to runtime
is making things slow.

What I found is that using type erasure actually sped up clang-tidy
considerably.
The template approach we used at the beginning was causing a huge amount of
code to be generated. More code == slower code.
Type erasure allowed to collapse common things into a single implementation.
In any case, most of the matching happens dynamically anyway because the
type of the children of nodes is not statically known. They are Expr or
Decl and we have to dyn_cast<> them anyway.

I optimized the runtime cost of the matchers until they stopped being the
bottle neck.
Right now, most of the runtime comes from the parsing phase, memory
allocation due to caching and hits found in uninteresting code.
It doesn't seem that running the matchers themselves is the slow section
anymore.
For most compilation units, parsing is still more than half the walltime.

  This seems to be consistent with the findings
of the imlpementors of OCLint and why they make the statement I quoted
earlier.

>
> > Although I'm mostly an armchair spectator right now w.r.t. clang
tooling
> > stuff, I think that use cases like clang-query and in general the
dynamic
> > AST matchers suggests that expression templates aren't a very good
solution
> > because they will require reimplementing things from scratch for the
> > "dynamic" case.

When I look at a library like Boost.Spirit that heavily uses
expression templates I don't see how I'm losing the dynamic case at
all. Maybe I'm misunderstanding what you're saying.

> Sam has worked on making clang tidy use cases faster. I
> expect in the end the right thing is to do less duplicate work which I
> think we will be able to do with modules.

I have done work on many fronts, like:
- Skip redundant work. Eg: filter the matchers by type and run only the
ones that are compatible.
- Reduce overhead of the system. Eg: collapse virtual functions to avoid
multiple consecutive vtable jumps.
- Extract common ops: Eg: move the type checking outside of the virtual
dispatch.

While modules are nice, I don't think it's reasonable to expect that
modules will be deployed over a significant chunk of the world's C++
code anytime soon. So it doesn't seem practical to say "modules will
fix this".

> > Regardless, I don't think that the primary bottleneck is not doing
enough
> > at compile time (of the matchers). Rather we aren't effectively
indexing
> > the underlying data set we are trying to query (right now, we're more
like
> > grep and less like google).

Right now we do a single pass on the AST to run all checks.
Adding an indexing step would not help much, unless we can do it directly
in the parsing.
Maybe with modules we could add an index into the module and avoid the full
AST traversal.

In article <CAOsfVvmS8vwUp6b1gc60X3rLN8qOjSvDwPACUedWLYsF9qGwVw@mail.gmail.com>,
    Manuel Klimek <klimek@google.com> writes:

Well, it's a matter of prioritization. [...]

That's completely understandable and reasonable. I'm not enough of an
expression template guru to volunteer for this.

At this time, I'm concentrating on adding refactorings to cland-tidy
and the like as appropriate. My first real feature was recently
accepted and I'm excited about that :-). If clang-tidy starts getting
more widely adopted, that might give us the motivated contributors we
need to address improving the performance of the tool's underlying
infrastructure.