Using the Clang AST to generate other abstract models

Hi There,

I’m new to this list and new to mucking around with compilers, so I crave your indulgence.

There is a suite of tools called Moose Tools (, which is somewhat like a Business Intelligence platform for codebases. You convert a codebase to a model of that codebase, and then you can build queries on top of that mode, and visualisations on top of those queries. The models are called an “MSE” file, and they seem similar to an AST.

So anyway, there are reasonably good tools to import codebases from java, smalltalk, etc, but nothing that I’ve tried for C/C++ seems to do the job. So, I did a little bit of reading about Clang and LLVM, and have been wanting to try using the Clang front-end but instead of spitting out LLVM IR I want to generate an MSE file. I had a look at the output from the AST-dump mode of Clang and it looks like the data would be suitable, I could probably write a python script to parse that output and convert it, but I would rather do it properly.

I’m not sure of the right approach though… I could start hacking on the AST-dump mode, but I imagine that this is not something that you would ever want to keep in the CFE codebase. So, is there some guidance somewhere, on putting together the minimal subset of the CFE that I would need in order to be able to dump the AST? Or should I just fork CFE and start from there?

Many thanks,

Guy Sherman.

Hi Guy, clang used to have xml printer long time ago but it was removed as it lacked in many ways. This came up a few times and I think the community is not opposed to having this functionality, if done right. This is in case you’re interested in contributing your changes back to clang. If not you can just hack away on your working copy.

Others might have better ideas, but you can search the mailing list for xml schema related discussion


Hi Nikola,

I would be more than happy to contribute my work back… MSE is not an XML-based format though. Although I guess say I could use XML as an intermediate format. Would the AST-dump code be the best starting point either way?



Guy, Nikola,

Wouldn't it be better to implement this as a separate
RecursiveASTVisitor-based tool? That way you don't have to interpret
intermediate formats, but rather stay close to the source.

Also, a RecursiveASTVisitor implementation can be made complete;
--ast-dump seems to be selective in what it emits.

- Kim

Hi Kim,

I’ve done a bit more reading, and I think that my approach will be to build a Clang plugin that contains
a RecursoveASTVisitor implementaiton. Does this sound like the right approach to you?



Hi Guy,

I can't make any solid recommendations, I live at the outskirts of
Clang development, and I'm often wrong :slight_smile: But it seems like it would
give you the highest-fidelity model of the C++ AST.

If I understand things correctly, a plugin runs *with compilation*. If
you want to run your translation without actually compiling/generating
code, a tool might be a better fit:

Hope that helps,
- Kim

Hi Guy,

I don't know if this is going to be directly helpful for you, but you
might be interested in having a look at the clang plugin we have been
coding to export the Clang AST in Json and load it into Ocaml (using a
generated parser based on the inlined schema).

Disclaimer: this is still work in progress for the c++ part.

-- Mathieu