I recently started looking into the Clang project and have to admit that I would never have expected it to be that mature. And I really appreciate the design decisions that lead to Clang being that re-usable and flexible. I think you’re doing each and every C++ programmer a great favor and Clang will spawn an immense number of awesome tools in the future that we previously might not have imagined being possible to implement.
Enough praising, here is my actual question:
Is there any tool that reads an AST, saves it to some file or database, and later is able to restore that AST completely (e.g. to be able to create LLVM IR from it)? I’m trying to do something similar, so it’d be nice to know whether someone did that before.
The precompiled header implementations should do something like this, shouldn’t they? Is their AST representation complete, or are they missing things like control-flow?
Yes to everything. The precompiled header implementation serializes the entire AST. You can use clang -cc1’s ‘-emit-ast’ option to emit the serialized AST into a “.ast” file, which can then be used to generate LLVM IR. It hasn’t been extensively tested, and to my knowledge nobody is doing this in a production environment, but the test file test/Frontend/ast-codegen.c illustrates how to do it and that it isn’t completely broken.
Thank you for your reply, Doug.
The reason I am interested in restoring and/or building an AST is because I am thinking about using Clang as the backend for a programming language. It will obviously be a subset of C++. It would be awesome to use libraries written in C++ in it. This means, at a minimum, instantiating “simple” classes and calling functions. Think of “C++ light” with less baggage from C and more straightforward Syntax. I am going to evaluate whether building my own, simpler AST and translating it into the Clang AST might work.
Do you think that this is feasible, or is it doomed to fail for some obvious reasons?
This seems overly more complex to me than just generating C++11 directly
from your own parser instead. Instead of spending A LOT OF TIME porting
your tool to the Clang AST moving target, just generate something quite
standard. By using more or less macroprocessing and metaprogramming,
you may even tune the complexity of your generator (for example like
https://github.com/MetaScale/nt2 ). And the generated C++11 programs can
pretty well "use libraries in C++ in it" as you expect! After all,
C++11 can be seen as a serialized version of the Clang AST with some
As an example, I was thinking about implementing a Fortran 2008
front-end for Clang this way, by reusing the existing run-time and
relying on the fact that the Fortran norm describes now an API for
interfacing with C and externalizes its array triplet notation through
another API (so for example you could use boost::multi_array to
implement them at the C++ level).
Generating correct Clang ASTs from anything but parsed C++ code is going to be extremely complicated. I think you’re better off generating C++ directly, or simply keeping your language front end separate from Clang.
I’ve done this in a C++/CLI Clang implementation I’ve been working on. It generates AST nodes directly from .NET types read from assemblies. I would not say it was extremely complicated, but definitely tricky. There are a lot of things that are not obvious and you will only find out about them when the code crashes inside Clang internals, but if you’re not afraid of debugging it’s pretty doable.
In case anyone is interested to look at the code: https://github.com/tritao/clang/blob/master/lib/Sema/SemaCLI.cpp