Dumping AST information to other formats

Clang currently supports various -cc1 options that allow displaying
AST information (-ast-dump, -ast-print, -ast-list, etc), but these
options are not convenient to consume by third-party tools. GrammaTech
has ongoing research efforts where we would like to output some
information from the AST to a more open machine-consumable format
(such as JSON or s-expressions). We propose adding an optional output
format to the -ast-dump command allowing the user to select from
either the default or JSON formats. If the output format is not
explicitly specified, it will continue to default to the same textual
representation it uses today. e.g., clang -cc1 -ast-dump=json foo.c.
This feature is intended to output a safe subset of AST information
that is considered crucial rather than an implementation detail (like
the name of a NamedDecl object and the SourceRange for the name), so
the output is expected to be mostly stable between releases.

Once upon a time, there was -ast-print-xml. This -cc1 option was
dropped because it was frequently out of sync with the AST data. It is
right to ask: why would JSON, etc be any different? This is still an
open question, but a goal of this implementation will be to ensure
it's easier to maintain as the AST evolves. However, this feature is
intended to output a safe subset of AST information, so I don't think
this feature will require any more burden to support than -ast-dump
already requires (which is extremely limited). If AST information is
found to be missing from the output, it seems reasonable to have a
discussion as to whether it is stable information or an implementation
detail, so missing information is to be expected rather than concerned
by. That said, GrammaTech is able to commit to maintaining this code
for at least the next 1-2 years and possibly beyond as it useful
functionality for our research efforts.

I wanted to see if there were concerns or implementation ideas the
community wanted to share before beginning the implementation phase of
this feature.

~Aaron

Thank you for passing this along -- it's actually somewhat aligned
with what I was envisioning. I very much like splitting out the
traversal and the printing mechanisms.

Would you like to be included on the review thread when I submit a patch?

~Aaron

Hi Aaron,

You might find useful the recent work we have done on stable identifiers for AST:
now Stmt and Decl classes have a “getID” method,
which returns an identifier stable across different runs (at least on the same architecture, probably not the same for different ones).

George