Using clang-query on the whole project

Heya,

I've found clang-query to be useful to query single files. Matching the Clang
AST is easy enough and fast to learn. I don't see how to use clang-query on a
whole project, though, which would be super useful.

A simple example where it's effectively useless: Assume some header "global.h"
containing "struct Global {};", every file in the project includes this header.

Now the idiomatic way to run clang-query over all the files in the project:
  clang-query -p $BUILD $(find . -name "*.cpp")

Let's try to find record decls named "Global":
  match recordDecl(hasName("Global"), isDefinition())

=> You get one match per file / translation unit. But in fact, this is not
really what you want. You want one match here.

So of course this is difficult: clang-query is TU-centric, while for a whole
project you'd "somehow" want a global view over the source code and not have
duplicate results for a query like above.

Questions:
- Is it possible to get clang-query to behave like that?
  - Any research/pointers in that regard?
- Is it possible to filter duplicates in the results without external tools?

Thanks a lot for a great tool so far!
Greets

Heya,

I’ve found clang-query to be useful to query single files. Matching the Clang
AST is easy enough and fast to learn. I don’t see how to use clang-query on a
whole project, though, which would be super useful.

A simple example where it’s effectively useless: Assume some header “global.h”
containing “struct Global {};”, every file in the project includes this header.

Now the idiomatic way to run clang-query over all the files in the project:
clang-query -p $BUILD $(find . -name “*.cpp”)

Let’s try to find record decls named “Global”:
match recordDecl(hasName(“Global”), isDefinition())

=> You get one match per file / translation unit. But in fact, this is not
really what you want. You want one match here.

So of course this is difficult: clang-query is TU-centric, while for a whole
project you’d “somehow” want a global view over the source code and not have
duplicate results for a query like above.

Questions:

  • Is it possible to get clang-query to behave like that?
  • Any research/pointers in that regard?
  • Is it possible to filter duplicates in the results without external tools?

Thanks a lot for a great tool so far!

Well, usually at that point you’ll want to start writing a go/clangmr.

It might also be cool to make clang-query work as clangmr; sounds like that would be a nice 20% project :wink:

In theory it should be easy to modify clang-query to check the location of a declaration and only return declarations for unique locations.

Heya,

I’ve found clang-query to be useful to query single files. Matching the Clang
AST is easy enough and fast to learn. I don’t see how to use clang-query on a
whole project, though, which would be super useful.

A simple example where it’s effectively useless: Assume some header “global.h”
containing “struct Global {};”, every file in the project includes this header.

Now the idiomatic way to run clang-query over all the files in the project:
clang-query -p $BUILD $(find . -name “*.cpp”)

Let’s try to find record decls named “Global”:
match recordDecl(hasName(“Global”), isDefinition())

=> You get one match per file / translation unit. But in fact, this is not
really what you want. You want one match here.

So of course this is difficult: clang-query is TU-centric, while for a whole
project you’d “somehow” want a global view over the source code and not have
duplicate results for a query like above.

Questions:

  • Is it possible to get clang-query to behave like that?
  • Any research/pointers in that regard?
  • Is it possible to filter duplicates in the results without external tools?

In theory it should be easy to modify clang-query to check the location
of a declaration and only return declarations for unique locations.

Well, the problem is that a clang-query process is started for each TU, so you’d need to somehow tell clang-query which results were in a completely different process of clang-query. I don’t think this is going to be easy or worth it…

This isn't entirely accurate. Each TU is stored in a separate AST, but all
of the ASTs live in a single process. When a match query is run, we simply
iterate over a vector of TU ASTs (see MatchQuery::run in Query.cpp). If you
wanted to deduplicate results, you could probably do it in MatchQuery::run
by pretty printing each result's SourceLocation and using that as a string key.

Thanks,

Heya,

I've found clang-query to be useful to query single files. Matching the
Clang
AST is easy enough and fast to learn. I don't see how to use clang-query
on a
whole project, though, which would be super useful.

A simple example where it's effectively useless: Assume some header
"global.h"
containing "struct Global {};", every file in the project includes this
header.

Now the idiomatic way to run clang-query over all the files in the
project:
  clang-query -p $BUILD $(find . -name "*.cpp")

Let's try to find record decls named "Global":
  match recordDecl(hasName("Global"), isDefinition())

=> You get one match per file / translation unit. But in fact, this is not
really what you want. You want one match here.

So of course this is difficult: clang-query is TU-centric, while for a
whole
project you'd "somehow" want a global view over the source code and not
have
duplicate results for a query like above.

Questions:
- Is it possible to get clang-query to behave like that?
  - Any research/pointers in that regard?
- Is it possible to filter duplicates in the results without external
tools?

Thanks a lot for a great tool so far!

Well, usually at that point you'll want to start writing a go/clangmr.

What is "go/clangmr"? Some sort of golang library? Can't seem to find a
reference other than http://research.google.com/pubs/pub41342.html but no
mention of "go".

-- Sean Silva

What about unity build, you’d have one cpp file and include guards would take care of the rest.

Heya,

I’ve found clang-query to be useful to query single files. Matching the
Clang
AST is easy enough and fast to learn. I don’t see how to use clang-query
on a
whole project, though, which would be super useful.

A simple example where it’s effectively useless: Assume some header
“global.h”
containing “struct Global {};”, every file in the project includes this
header.

Now the idiomatic way to run clang-query over all the files in the
project:
clang-query -p $BUILD $(find . -name “*.cpp”)

Let’s try to find record decls named “Global”:
match recordDecl(hasName(“Global”), isDefinition())

=> You get one match per file / translation unit. But in fact, this is
not
really what you want. You want one match here.

So of course this is difficult: clang-query is TU-centric, while for a
whole
project you’d “somehow” want a global view over the source code and not
have
duplicate results for a query like above.

Questions:

  • Is it possible to get clang-query to behave like that?
  • Any research/pointers in that regard?
  • Is it possible to filter duplicates in the results without external
    tools?

In theory it should be easy to modify clang-query to check the location
of a declaration and only return declarations for unique locations.

Well, the problem is that a clang-query process is started for each TU, so
you’d need to somehow tell clang-query which results were in a completely
different process of clang-query. I don’t think this is going to be easy or
worth it…

This isn’t entirely accurate. Each TU is stored in a separate AST, but all
of the ASTs live in a single process. When a match query is run, we simply
iterate over a vector of TU ASTs (see MatchQuery::run in Query.cpp). If you
wanted to deduplicate results, you could probably do it in MatchQuery::run
by pretty printing each result’s SourceLocation and using that as a string key.

That doesn’t really scale though. But yea, for small projects it might be enough…

> > > > Heya,
> > > >
> > > > I've found clang-query to be useful to query single files. Matching
>
> the
>
> > > Clang
> > >
> > > > AST is easy enough and fast to learn. I don't see how to use
>
> clang-query
>
> > > on a
> > >
> > > > whole project, though, which would be super useful.
> > > >
> > > > A simple example where it's effectively useless: Assume some header
> > >
> > > "global.h"
> > >
> > > > containing "struct Global {};", every file in the project includes
>
> this
>
> > > header.
> > >
> > > > Now the idiomatic way to run clang-query over all the files in the
> > >
> > > project:
> > > > clang-query -p $BUILD $(find . -name "*.cpp")
> > > >
> > > > Let's try to find record decls named "Global":
> > > > match recordDecl(hasName("Global"), isDefinition())
> > > >
> > > > => You get one match per file / translation unit. But in fact, this
>
> is
>
> > > not
> > >
> > > > really what you want. You want one match here.
> > > >
> > > > So of course this is difficult: clang-query is TU-centric, while for
>
> a
>
> > > whole
> > >
> > > > project you'd "somehow" want a global view over the source code and
>
> not
>
> > > have
> > >
> > > > duplicate results for a query like above.
> > > >
> > > > Questions:
> > > > - Is it possible to get clang-query to behave like that?
> > > >
> > > > - Any research/pointers in that regard?
> > > >
> > > > - Is it possible to filter duplicates in the results without
> > > > external
> > >
> > > tools?
> > >
> > > In theory it should be easy to modify clang-query to check the
> > > location
> > > of a declaration and only return declarations for unique locations.
> >
> > Well, the problem is that a clang-query process is started for each TU,
>
> so
>
> > you'd need to somehow tell clang-query which results were in a
> > completely
> > different process of clang-query. I don't think this is going to be easy
>
> or
>
> > worth it...
>
> This isn't entirely accurate. Each TU is stored in a separate AST, but all
> of the ASTs live in a single process. When a match query is run, we simply
> iterate over a vector of TU ASTs (see MatchQuery::run in Query.cpp). If
> you
> wanted to deduplicate results, you could probably do it in MatchQuery::run
> by pretty printing each result's SourceLocation and using that as a string
> key.

That doesn't really scale though. But yea, for small projects it might be
enough...

Thanks for the pointers! This looks useful indeed.