ast matchers, structDecl, unionDecl missing? recordDecl is only for CXXRecordDecl

Hi,

I came across some problems I will describe below while I was designing/implementing some new libclang functions. These functions are a C interface (and interface for scripting languages) to query ast with text based matchers. On top of these functions I also wrote a tool that can query ast with matchers and can call a chain of methods on the retrived Ast nodes, like:

cmatch -a SemaDecl.ast \

-m “methodDecl(returns(pointsTo(hasDeclaration(recordDecl(isDerivedFrom(recordDecl(hasName("Decl"))))))))”
-r getParent.getNameAsString.dump

I will make the source code as soon as possible public.
In the source code, the methods are added/defined in some special header files with macros in the form of.

METHOD(Decl, getDeclKindName, char)
METHOD(Decl, dump, void)
METHOD(Decl, dumpColor, void)
METHOD(NamedDecl, getNameAsString, string)

METHOD(RecordDecl, getPreviousDecl, Decl)

METHOD(RecordDecl, getMostRecentDecl, Decl)
METHOD(RecordDecl, hasVolatileMember, bool)
METHOD(FunctionDecl, getBody, Stmt)

METHOD(FunctionDecl, getDeclName, DeclarationName)
METHOD(FunctionDecl, getReturnType, QualType)
METHOD(CXXMethodDecl, getParent, Decl)

Will need to generate (also with matchers) the full list of “usable” methods in form of such declarations that can be compiled into my library.
The tool can also be called by zsh completion mechanism to complete the ast expression.

The above functionality is implemented, I can add almost all methods that have void parameters to all subclasses of Decl, Stmt, Type and some POD kind structures. (SourceLocation, QualType, etc)

The next feature I wanted to add is shell completion mechanism for the chain of methods that can be called on the matched ast nodes. Unfortunately I bumped into the following little annoyances:

1.a it is impossible to deduce the “inner” type that will be returned by a matcher, so that I can deduce the list of methods.

Optional matcher = Parser::parseMatcherExpression(

StringRef(expr), nullptr, namedValues, &diag);

would be nice to have something like matcher->getReturnKind() similar to matcher->getSupportedKind()

Question:

is there a way to find back this information?

1.b there is some information about the return type, but that is in case of recordDecl for instance is Matcher, and not Matcher

I understand that the “narrowing matchers” like isDerivedFrom would have return type Matcher and not Matcher.
Question: Any reason for this inconsistency?

2. delving more into the source code to find out 1.b (and tested) I found that recordDecl matches only CXXRecordDecl, thus for C programs one cannot match-find structures/unions.

Question: was this intentionally implemented so?

regards
mph

+Sam, who wrote the dynamic AST matchers, and Peter, who wrote the clang-query tool which is already in clang-tools-extra.

Hi,

I came across some problems I will describe below while I was designing/implementing some new libclang functions. These functions are a C interface (and interface for scripting languages) to query ast with text based matchers. On top of these functions I also wrote a tool that can query ast with matchers and can call a chain of methods on the retrived Ast nodes, like:

cmatch -a SemaDecl.ast \

-m “methodDecl(returns(pointsTo(hasDeclaration(recordDecl(isDerivedFrom(recordDecl(hasName("Decl"))))))))”
-r getParent.getNameAsString.dump

I will make the source code as soon as possible public.
In the source code, the methods are added/defined in some special header files with macros in the form of.

METHOD(Decl, getDeclKindName, char)
METHOD(Decl, dump, void)
METHOD(Decl, dumpColor, void)
METHOD(NamedDecl, getNameAsString, string)

METHOD(RecordDecl, getPreviousDecl, Decl)

METHOD(RecordDecl, getMostRecentDecl, Decl)
METHOD(RecordDecl, hasVolatileMember, bool)
METHOD(FunctionDecl, getBody, Stmt)

METHOD(FunctionDecl, getDeclName, DeclarationName)
METHOD(FunctionDecl, getReturnType, QualType)
METHOD(CXXMethodDecl, getParent, Decl)

Will need to generate (also with matchers) the full list of “usable” methods in form of such declarations that can be compiled into my library.
The tool can also be called by zsh completion mechanism to complete the ast expression.

The above functionality is implemented, I can add almost all methods that have void parameters to all subclasses of Decl, Stmt, Type and some POD kind structures. (SourceLocation, QualType, etc)

The next feature I wanted to add is shell completion mechanism for the chain of methods that can be called on the matched ast nodes. Unfortunately I bumped into the following little annoyances:

1.a it is impossible to deduce the “inner” type that will be returned by a matcher, so that I can deduce the list of methods.

Optional matcher = Parser::parseMatcherExpression(

StringRef(expr), nullptr, namedValues, &diag);

would be nice to have something like matcher->getReturnKind() similar to matcher->getSupportedKind()

Question:

is there a way to find back this information?

1.b there is some information about the return type, but that is in case of recordDecl for instance is Matcher, and not Matcher

I understand that the “narrowing matchers” like isDerivedFrom would have return type Matcher and not Matcher.
Question: Any reason for this inconsistency?

This is not an inconsistency - the return type is always the type on which you want to be able to run the matcher. And a type matcher like recordDecl is one you want to run on all Decls to see whether they are actually record decls.

2. delving more into the source code to find out 1.b (and tested) I found that recordDecl matches only CXXRecordDecl, thus for C programs one cannot match-find structures/unions.

Question: was this intentionally implemented so?

Yes, but that was a stupid idea back at the time (it was mine, so I can call myself stupid without being too offensive :slight_smile:
We should rename it to cxxRecordDecl.

Hi,

I came across some problems I will describe below while I was
designing/implementing some new libclang functions. These functions are a C
interface (and interface for scripting languages) to query ast with text
based matchers. On top of these functions I also wrote a tool that can
query ast with matchers and can call a chain of methods on the retrived Ast
nodes, like:

cmatch -a SemaDecl.ast \
          -m
"methodDecl(returns(pointsTo(hasDeclaration(recordDecl(isDerivedFrom(recordDecl(hasName(\"Decl\"))))))))"
\
          -r getParent.getNameAsString.dump

I will make the source code as soon as possible public.
In the source code, the methods are added/defined in some special header
files with macros in the form of.

METHOD(Decl, getDeclKindName, char)
METHOD(Decl, dump, void)
METHOD(Decl, dumpColor, void)
METHOD(NamedDecl, getNameAsString, string)
METHOD(RecordDecl, getPreviousDecl, Decl)
METHOD(RecordDecl, getMostRecentDecl, Decl)
METHOD(RecordDecl, hasVolatileMember, bool)
METHOD(FunctionDecl, getBody, Stmt)
METHOD(FunctionDecl, getDeclName, DeclarationName)
METHOD(FunctionDecl, getReturnType, QualType)
METHOD(CXXMethodDecl, getParent, Decl)

Will need to generate (also with matchers) the full list of "usable"
methods in form of such declarations that can be compiled into my library.
The tool can also be called by zsh completion mechanism to complete the
ast expression.

The above functionality is implemented, I can add almost all methods that
have void parameters to all subclasses of Decl, Stmt, Type and some POD
kind structures. (SourceLocation, QualType, etc)

The next feature I wanted to add is shell completion mechanism for the
chain of methods that can be called on the matched ast nodes. Unfortunately
I bumped into the following little annoyances:

*1.a it is impossible to deduce the "inner" type that will be returned by
a matcher, so that I can deduce the list of methods. *

Optional<DynTypedMatcher> matcher = Parser::parseMatcherExpression(
         StringRef(expr), nullptr, namedValues, &diag);

would be nice to have something like matcher->getReturnKind() similar to
matcher->getSupportedKind()

There is no "return" kind on a matcher because the matcher always returns
'bool'.

Question:
is there a way to find back this information?

It is impossible to know the real type of the node before matching, as it
could be a subclass of what you are matching.
A DynTypedMatcher has 2 "Kinds": SupportedKind and RestrictKind.

SupportedKind is the minimum static type that the matcher accepts. This is
the argument type used in the static matcher declaration.
It is used to simulate the compile time errors you would get on the static
matchers during parsing of the dynamic matchers.
For example, recordDecl() has a SupportedKind of Decl because it takes any
Decl node to check if it is a CXXRecordDecl, hasName() has a SupportedKind
of NamedDecl.

RestrictKind is the minimum dynamic type that the matcher accepts. This is
used to do the type matching.
For example, recordDecl() has a RestrictKind of CXXRecordDecl because it
needs a node that is a CXXRecordDecl, hasName() has a RestrictKind of
NamedDecl.
There are some optimizations on the dynamic matcher creation that
propagates and combines the RestrictKind to reject values earlier.
RestrictKind has to be the same or a subclass of SupportedKind.

You should be able to use RestrictKind to determine the API supported by
the matched node.

thanks for the answer and for the promised fix. Please also consider
renaming recordDecl matcher to cxxRecordDecl to reflect the correct type

There is no "return" kind on a matcher because the matcher always returns

'bool'.

was bit dizzy from the matchers, so maybe did not express myself correctly.
Was more using the wording from
http://clang.llvm.org/docs/LibASTMatchersReference.html
Was referring to the "returned" objects by the matchers and not the return
type of o matcher method on a node. Now going back to this page, I think
there is a mistake in the header of the matchers table. I think the
"Parameters" should be swapped with the "Return types". Or I am still
confused with the wording. Does "parameter" here mean the ast node argument
passed to the matcher method? That is recordDecl is applied aggainst Decl
and returns CXXRecord?
Also after reading through the table few times I discovered that the
matcher names are "clickable" and a little doc expands. Probably not so
obvious for most people.

It is impossible to know the real type of the node before matching, as it

could be a subclass of what you are matching.

A DynTypedMatcher has 2 "Kinds": SupportedKind and RestrictKind.

SupportedKind is the minimum static type that the matcher accepts. This is
the argument type used in the static matcher declaration.
It is used to simulate the compile time errors you would get on the static
matchers during parsing of the dynamic matchers.
For example, recordDecl() has a SupportedKind of Decl because it takes any
Decl node to check if it is a CXXRecordDecl, hasName() has a SupportedKind
of NamedDecl.

RestrictKind is the minimum dynamic type that the matcher accepts. This is
used to do the type matching.
For example, recordDecl() has a RestrictKind of CXXRecordDecl because it
needs a node that is a CXXRecordDecl, hasName() has a RestrictKind of
NamedDecl.
There are some optimizations on the dynamic matcher creation that
propagates and combines the RestrictKind to reject values earlier.
RestrictKind has to be the same or a subclass of SupportedKind.

You should be able to use RestrictKind to determine the API supported by
the matched node.

Well, was not trying to find out the the "real" type as obviously the real
type is hidden in the ast itself and that is not known when the matcher is
being constructed. I am interested indeed in the RestrictedType, Will give
a try in few days and report.

rgrds,
mobi phil

being mobile, but including technology

Well, was not trying to find out the the "real" type as obviously the real
type is hidden in the ast itself and that is not known when the matcher is
being constructed. I am interested indeed in the RestrictedType, Will give
a try in few days and report.

do not remember exactly, but maybe the reason I did not dig into
RestrictedKind is there is no get-er (getRestrictedKind()). Can we trade
the bug I found versus asking to implement it? :wink:

regards

Well, was not trying to find out the the "real" type as obviously the
real type is hidden in the ast itself and that is not known when the
matcher is being constructed. I am interested indeed in the RestrictedType,
Will give a try in few days and report.

do not remember exactly, but maybe the reason I did not dig into
RestrictedKind is there is no get-er (getRestrictedKind()). Can we trade
the bug I found versus asking to implement it? :wink:

There is no getRestrictKind() because this is a new implementation detail
and I've tried to keep it as such.
For example, recently I needed this information for an optimization and
added DynTypedMatcher::canMatchNodesOfKind() instead of exposing
RestrictKind directly.
If there is a good reason to make it part of the API, we can do that.
_Sam

Unintentionally replied only to Sam, put back cfe-dev in CC.

We should expose it, but with a different name. “Restrict” is still an implementation detail, imo.

for me the information is obviously more important. Indeed by outside world it should be sthg. like MatchFindReturnKind, but difficult to find a reasonable and short name.

Any reason to not do this work directly on clang-query?
It would be awesome to have this support in an upstream tool.

this wants to be mainly an extension to “C” libclang or an independent library mainly to be used with scripting languages. The tool “cmatch” is just a prototype tool on top of the lib, but I started to use it to query ast-s, mainly clang and llvm codebase. Given long compilation times with C++ code (and obviously other reasons) IMHO it is less time consuming to prototype refactoring tools etc. with scripting languages. The stuff in MatchFinder.cpp could be exposed to the C++ world as well, but lot of design decisions were taken having in mind the C and scripting lang world.

mobi phil