The status of XML Representation of ASTs in clang

Hi all,

Our team just started researching and implementing software restructuring for C++ projects. We planned to use some format which is similar to XML for this purpose. But the core idea is basically the same: to generalize the representations of ASTs so that other tools can consume it. At present we are not sure what information should be put into this format. Therefore, I’m wondering the status of this open project XML Representation of ASTs. Is someone working on this now?

Best regards,

layne

CastXML may be what you’re looking for.

Nobody is working on that project, and in fact, I think that project is still listed on the open projects page accidentally. Looking at the history for that file, it’s not been materially touched since 2013 and has quite a bit of outdated content (ouch!). I’m really sorry for the stale information, I’ll try to clean that up in the near future.

Once upon a time, we allowed dumping the AST to XML. However, this functionality was removed in 2013 (r192131 - Remove -ast-dump-xml.) because it was out of date, produced incorrect output, and would crash. Later, we introduced the ability to dump the AST to JSON for a more structured AST dump (Introduce the ability to dump the AST to JSON. · llvm/llvm-project@2ce598a · GitHub). However, this does not have a schema and does not promise a stable format across versions of Clang. Like our textual AST dump, it’s best-effort functionality for a more structured way to obtain information about the AST mostly intended for debugging purposes. I know some folks will use this format for introspection by other tools, but for production-capable introspection of the AST, we recommend using libclang, AST matchers, transformers, etc depending on the need.

We would welcome changes to the JSON AST dumper to expose more information, but I think we’d be hesitant to make any guarantees about stability of the format. The AST dumps are intended to faithfully represent the details of our AST (for example, to help people write AST matchers) and stability guarantees may conflict with future refactoring of AST functionality. We also don’t wish to increase the maintenance burden on people modifying the AST to require them to also expose the changes via AST dumping.

If there are no strict requirements that the output follows the XML standards, then perhaps you can use the outputs produced by dump. These have a somewhat XML-like structure.

Dear Aaron,

Thanks for your explanation. I think we (TU Delft, TU Eindhoven, TNO and Philips) will to some extent work on it under project Mascot though we didn’t use JSON or other formats. But I really appreciate the requirements (General, Stable, etc) listed in the open project.

Our idea is as follows: we first dump ATerms based on Clang (Currently I have made an ATerm dumper based on the class clang::TextNodeDumper). Then the ATerms would be delivered to Spoofax which is able to perform code transformations. At last we somehow pretty-print the output of Spoofax to C++ source code. Along the way we would like to fully make use of clang’s infrastructure to do this, such as code layout preservation. We hope we could deliver a format which is stable and expressive enough for other tools.

I have known one researcher from TNO working on this based on eclipse CDT.

Will give an update if there is some progress has been made.

Best,

layne

1 Like

There is also Extract-API, but it is more documentation focussed.