clang AST printer

Hi,

I've been writing a tool to print a clang AST. You can find it at
GitHub - philipc/clang-ast. I've been mostly writing this as
a learning exercise, but I would like to see if there is any wider
interest in the tool.

I'm new to clang, and interested in working on tooling. For this task,
it is important to have an understanding of the AST. I've found the
documentation at http://clang.llvm.org/docs/InternalsManual.html and
http://clang.llvm.org/docs/IntroductionToTheClangAST.html (is there
any more?), but I think it would be helpful to be able to print the
AST for source code to see how it is used in practice. I've found two
ways to do this currently, which aren't quite what I wanted:

clang --ast-dump
- pretty prints some parts, has too much internal info, more suited
for debugging use by clang developers

clang --ast-dump-xml
- incomplete and XML is too verbose for this purpose

Since RecursiveASTVisitor seems to be the API that will be used by
tool developers, I've been using RAV to print the AST. I've found a
few minor bugs in RAV as a result of this, which I'm in the process of
submitting.

One limitation of RAV is that the Visit methods aren't given
information about their relationship with their parent, so the tool
can only list all the children of a node, without distinguishing
between the LHS and RHS of an operator, for example. Is there any
desire to extend RAV to be able to do this?

I've been writing small code snippets to test printing the various
parts of the AST. If you want to see examples of the output of the
tool, this is included in the test cases. As a quick example, "void
foo() {}" is printed as:

  FunctionDecl
    DeclarationName foo
    FunctionNoProtoType
      BuiltinType void
    CompoundStmt

Finally, I've been trying to give a textual description of the AST
grammar in https://github.com/philipc/clang-ast/blob/master/ast.txt.
This is an attempt to give something similar to
ast — Abstract Syntax Trees — Python 3.10.6 documentation. I'm not
sure if it is turning out to be that useful though.

Any comments welcome!

Thanks,
Philip

Hi Philip,

we had a discussion around the strategy for handling ast dumping
fairly recently. I think the common agreement in the end was that
instead of having an extra tool, we want to make clang's -ast-dump
awesome. I don't know what your ultimate goal is here, but from my
side any work that goes towards making clang -ast-dump better would be
highly appreciated :slight_smile:

Cheers,
/Manuel

I think this can be useful, in particular to explore Clang and see what data it's possible to get out of the AST. BTW, does it handle Objective-C?

Hi Philip,

we had a discussion around the strategy for handling ast dumping
fairly recently. I think the common agreement in the end was that
instead of having an extra tool, we want to make clang's -ast-dump
awesome.

Was this discussion on the mailing list? Any chance you could point me to it?

I don't know what your ultimate goal is here, but from my
side any work that goes towards making clang -ast-dump better would be
highly appreciated :slight_smile:

My motivation was having some way of getting the structure of the AST
for a given piece of source code, so that it is easier to correctly
call the AST matchers to match that code. The issue with clang
-ast-dump is that it pretty prints decls and types. So would the goal
here be to add more command line options to clang to control how
-ast-dump prints the AST?

Are there any other areas where you think -ast-dump needs to be better?

I haven't tested it, but in theory it should. I haven't put in any
effort to display information needed for Objective-C specific nodes
either though. The tool is still very incomplete.

Ok, so it basically only displays the C parts of the Objective-C code.

Hi Philip,

we had a discussion around the strategy for handling ast dumping
fairly recently. I think the common agreement in the end was that
instead of having an extra tool, we want to make clang's -ast-dump
awesome.

Was this discussion on the mailing list? Any chance you could point me to it?

http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20120716/060831.html

I don't know what your ultimate goal is here, but from my
side any work that goes towards making clang -ast-dump better would be
highly appreciated :slight_smile:

My motivation was having some way of getting the structure of the AST
for a given piece of source code, so that it is easier to correctly
call the AST matchers to match that code. The issue with clang
-ast-dump is that it pretty prints decls and types. So would the goal
here be to add more command line options to clang to control how
-ast-dump prints the AST?

No, I think the goal is to dump decls and types in a sensible way :slight_smile:

Are there any other areas where you think -ast-dump needs to be better?

As you said, decls and types need to be structured nicely. Also, a
while ago Richard proposed a patch to add coloring, no idea how far he
got (cc'ing him)

Cheers,
/Manuel

Hi Philip,

we had a discussion around the strategy for handling ast dumping
fairly recently. I think the common agreement in the end was that
instead of having an extra tool, we want to make clang's -ast-dump
awesome.

Was this discussion on the mailing list? Any chance you could point me to it?

http://www.mail-archive.com/cfe-commits@cs.uiuc.edu/msg53556.html
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2012-July/023009.html

Hi Philip,

we had a discussion around the strategy for handling ast dumping
fairly recently. I think the common agreement in the end was that
instead of having an extra tool, we want to make clang’s -ast-dump
awesome.

Was this discussion on the mailing list? Any chance you could point me to it?

http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20120716/060831.html

I don’t know what your ultimate goal is here, but from my
side any work that goes towards making clang -ast-dump better would be
highly appreciated :slight_smile:

My motivation was having some way of getting the structure of the AST
for a given piece of source code, so that it is easier to correctly
call the AST matchers to match that code. The issue with clang
-ast-dump is that it pretty prints decls and types. So would the goal
here be to add more command line options to clang to control how
-ast-dump prints the AST?

No, I think the goal is to dump decls and types in a sensible way :slight_smile:

More specifically, I think we agreed on changing -ast-dump from pretty-printing declarations to outputting them in the same LISP-style format, which is used for statements. I was going to implement that, but never had time to do this. If you’re going to continue work on your utility, it would be much more valuable, if you instead improved clang’s current -ast-dump option. In that case you consider to go this way, here’s where to start:

clang/lib/Frontend/ASTConsumers.cpp:120: ASTConsumer *clang::CreateASTDumper(StringRef FilterString)

is a common entry point, used by “clang -cc1 -ast-dump” and by “clang-check -ast-dump”. An additional benefit from this would be that after “-ast-dump” starts outputting the AST for declarations in a structured form, “-ast-dump-xml” will become useless and can be removed.

Does this mean that current declaration printer will be removed? I
see value in having a decl printer that prints declarations in
user-readable form, so I think we should keep it until clang-format
will be here.

BTW, a while ago I added tests for current decl printer to
unittests/AST/DeclPrinterTest.cpp.

Dmitri

>> Hi Philip,
>>
>> we had a discussion around the strategy for handling ast dumping
>> fairly recently. I think the common agreement in the end was that
>> instead of having an extra tool, we want to make clang's -ast-dump
>> awesome.
>
> Was this discussion on the mailing list? Any chance you could point me
> to it?

http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20120716/060831.html

>> I don't know what your ultimate goal is here, but from my
>> side any work that goes towards making clang -ast-dump better would be
>> highly appreciated :slight_smile:
>
> My motivation was having some way of getting the structure of the AST
> for a given piece of source code, so that it is easier to correctly
> call the AST matchers to match that code. The issue with clang
> -ast-dump is that it pretty prints decls and types. So would the goal
> here be to add more command line options to clang to control how
> -ast-dump prints the AST?

No, I think the goal is to dump decls and types in a sensible way :slight_smile:

More specifically, I think we agreed on changing -ast-dump from
pretty-printing declarations to outputting them in the same LISP-style
format, which is used for statements. I was going to implement that, but
never had time to do this. If you're going to continue work on your utility,
it would be much more valuable, if you instead improved clang's current
-ast-dump option. In that case you consider to go this way, here's where to
start:

clang/lib/Frontend/ASTConsumers.cpp:120: ASTConsumer
*clang::CreateASTDumper(StringRef FilterString)

is a common entry point, used by "clang -cc1 -ast-dump" and by "clang-check
-ast-dump". An additional benefit from this would be that after "-ast-dump"
starts outputting the AST for declarations in a structured form,
"-ast-dump-xml" will become useless and can be removed.

Does this mean that current declaration printer will be removed? I
see value in having a decl printer that prints declarations in
user-readable form, so I think we should keep it until clang-format
will be here.

I'd be curious what your use case is - after all, it'll only print
declrations in user-readable form, and all the other stuff will be
written in S-epxression-like syntax.

If we want a pretty-printer, I'd expect -pretty-print and not
-ast-dump to trigger that?

Cheers,
/Manuel

>> Hi Philip,
>>
>> we had a discussion around the strategy for handling ast dumping
>> fairly recently. I think the common agreement in the end was that
>> instead of having an extra tool, we want to make clang's -ast-dump
>> awesome.
>
> Was this discussion on the mailing list? Any chance you could point me
> to it?

http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20120716/060831.html

>> I don't know what your ultimate goal is here, but from my
>> side any work that goes towards making clang -ast-dump better would be
>> highly appreciated :slight_smile:
>
> My motivation was having some way of getting the structure of the AST
> for a given piece of source code, so that it is easier to correctly
> call the AST matchers to match that code. The issue with clang
> -ast-dump is that it pretty prints decls and types. So would the goal
> here be to add more command line options to clang to control how
> -ast-dump prints the AST?

No, I think the goal is to dump decls and types in a sensible way :slight_smile:

More specifically, I think we agreed on changing -ast-dump from
pretty-printing declarations to outputting them in the same LISP-style
format, which is used for statements. I was going to implement that, but
never had time to do this. If you're going to continue work on your utility,
it would be much more valuable, if you instead improved clang's current
-ast-dump option. In that case you consider to go this way, here's where to
start:

clang/lib/Frontend/ASTConsumers.cpp:120: ASTConsumer
*clang::CreateASTDumper(StringRef FilterString)

is a common entry point, used by "clang -cc1 -ast-dump" and by "clang-check
-ast-dump". An additional benefit from this would be that after "-ast-dump"
starts outputting the AST for declarations in a structured form,
"-ast-dump-xml" will become useless and can be removed.

Does this mean that current declaration printer will be removed? I
see value in having a decl printer that prints declarations in
user-readable form, so I think we should keep it until clang-format
will be here.

I'd be curious what your use case is - after all, it'll only print
declrations in user-readable form, and all the other stuff will be
written in S-epxression-like syntax.

The decl printer could be used to pretty-print decls in some
clang-based documentation generation system.

If we want a pretty-printer, I'd expect -pretty-print and not
-ast-dump to trigger that?

I'd just leave it as an API.

Dmitri

>> Hi Philip,
>>
>> we had a discussion around the strategy for handling ast dumping
>> fairly recently. I think the common agreement in the end was that
>> instead of having an extra tool, we want to make clang's -ast-dump
>> awesome.
>
> Was this discussion on the mailing list? Any chance you could point me
> to it?

http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20120716/060831.html

>> I don't know what your ultimate goal is here, but from my
>> side any work that goes towards making clang -ast-dump better would be
>> highly appreciated :slight_smile:
>
> My motivation was having some way of getting the structure of the AST
> for a given piece of source code, so that it is easier to correctly
> call the AST matchers to match that code. The issue with clang
> -ast-dump is that it pretty prints decls and types. So would the goal
> here be to add more command line options to clang to control how
> -ast-dump prints the AST?

No, I think the goal is to dump decls and types in a sensible way :slight_smile:

More specifically, I think we agreed on changing -ast-dump from
pretty-printing declarations to outputting them in the same LISP-style
format, which is used for statements. I was going to implement that, but
never had time to do this. If you're going to continue work on your utility,
it would be much more valuable, if you instead improved clang's current
-ast-dump option. In that case you consider to go this way, here's where to
start:

clang/lib/Frontend/ASTConsumers.cpp:120: ASTConsumer
*clang::CreateASTDumper(StringRef FilterString)

is a common entry point, used by "clang -cc1 -ast-dump" and by "clang-check
-ast-dump". An additional benefit from this would be that after "-ast-dump"
starts outputting the AST for declarations in a structured form,
"-ast-dump-xml" will become useless and can be removed.

Does this mean that current declaration printer will be removed? I
see value in having a decl printer that prints declarations in
user-readable form, so I think we should keep it until clang-format
will be here.

I'd be curious what your use case is - after all, it'll only print
declrations in user-readable form, and all the other stuff will be
written in S-epxression-like syntax.

The decl printer could be used to pretty-print decls in some
clang-based documentation generation system.

If we want a pretty-printer, I'd expect -pretty-print and not
-ast-dump to trigger that?

I'd just leave it as an API.

I'm not sure what you mean by that. I agree that there are use cases
for a pretty printer, but I think "half" of the ast-dump mechanism is
a bad place to have a pretty printer...

Cheers,
/Manuel

Imagine a Clang-based Doxygen-like tool. Not only we need to extract
documentation comments and format them, we also need to pretty-print
declarations to display them along with documentation. So only decl
printer is useful.

Dmitri

If we want a pretty-printer, I'd expect -pretty-print and not
-ast-dump to trigger that?

I'd just leave it as an API.

I'm not sure what you mean by that. I agree that there are use cases
for a pretty printer, but I think "half" of the ast-dump mechanism is
a bad place to have a pretty printer...

Imagine a Clang-based Doxygen-like tool. Not only we need to extract
documentation comments and format them, we also need to pretty-print
declarations to display them along with documentation. So only decl
printer is useful.

I think the point is not that printy-printing should be removed (I agree with Dmitri that it shouldn't) but that -ast-dump should not pretty-print Decls (I agree with Manuel that it shouldn't). These do not seem like incompatible goals.

I completely agree. Thanks for clarifying!

Dmitri

Yep, exactly. I think we're all in happy and violent agreement. Thanks
Jordan for helping to clear that up :slight_smile:

Cheers,
/Manuel

we had a discussion around the strategy for handling ast dumping
fairly recently. I think the common agreement in the end was that
instead of having an extra tool, we want to make clang's -ast-dump
awesome. I don't know what your ultimate goal is here, but from my
side any work that goes towards making clang -ast-dump better would be
highly appreciated :slight_smile:

Here's an idea:

$ clang -ast-dump="<arbitrary ASTMatcher expression>"

--Sean Silva

I can't tell you how awesome this is. This will have a big positive
effect on ease of learning the AST. It's really a "I wish I had that
when I was learning" sort of thing.

Finally, I've been trying to give a textual description of the AST
grammar in https://github.com/philipc/clang-ast/blob/master/ast.txt.
This is an attempt to give something similar to
ast — Abstract Syntax Trees — Python 3.10.6 documentation. I'm not
sure if it is turning out to be that useful though.

I've been whining about getting something like this for a long time :slight_smile:
See this post from a while back
<http://permalink.gmane.org/gmane.comp.compilers.clang.scm/54316&gt;\.

--Sean Silva

we had a discussion around the strategy for handling ast dumping
fairly recently. I think the common agreement in the end was that
instead of having an extra tool, we want to make clang's -ast-dump
awesome. I don't know what your ultimate goal is here, but from my
side any work that goes towards making clang -ast-dump better would be
highly appreciated :slight_smile:

Here's an idea:

$ clang -ast-dump="<arbitrary ASTMatcher expression>"

Yep, once the dynamic matcher stuff is in mainline that'll be possible :wink: