I am trying to export clang AST to a file, recreating the tree structure
in a suitable format e.g., xml/json. I am using the RecursiveASTVisitor
for this.
Question: What is the invariant for establishing parent-child
relationship between AST nodes? I see that one potential invariant is
``If two AST nodes have the same DeclContext pointer, then they are in
the same scope i.e., siblings or parent-children.`` Is there a stronger
invariant that establishes parent-child relationship? Sadly, the Decl
class doesn't have a getParent() that returns its parent AST node.
Also, is there a generic way I could ask the same question for nodes of
Stmt and Type classes?
P.S. I tried to infer the invariant by skimming through ASTDumper code.
But, ASTDumper invents custom traversal different from what RAV
provides, so it's non-trivial to map logic from/to two different models.
I am trying to export clang AST to a file, recreating the tree structure
in a suitable format e.g., xml/json. I am using the RecursiveASTVisitor
for this.
You can already save the AST in a way that you can read it back into clang, but not in a way that is readable by a different tool.
So far the consensus has been that this would be a very hard project; there has been an XML output once, but the problem is that it was never up-to-date / complete enough to do anything useful with it.
Generally, if you want to have a full XML/JSON/whatever output, somebody would need to implement it, and then make sure it doesn’t get out of date when the implementation changes.
So far we have build mostly clang-based tools that work directly on the C++ AST.
Question: What is the invariant for establishing parent-child
relationship between AST nodes? I see that one potential invariant is If two AST nodes have the same DeclContext pointer, then they are in the same scope i.e., siblings or parent-children. Is there a stronger
invariant that establishes parent-child relationship? Sadly, the Decl
class doesn’t have a getParent() that returns its parent AST node.
ASTContext has a getParents() method. Note that you can get multiple parents for a node (for example, non-type-dependent expressions often exist only once if they are part of multiple template instantiations).
Also, is there a generic way I could ask the same question for nodes of
Stmt and Type classes?
P.S. I tried to infer the invariant by skimming through ASTDumper code.
But, ASTDumper invents custom traversal different from what RAV
provides, so it’s non-trivial to map logic from/to two different models.
Thanks for the hint. I was just looking at the getParents() method
Quick question: is getParents(childNode)[0] *always* the parent (i.e.,
immediate parent, not ancestor) of childNode?
So far the consensus has been that this would be a very hard project;
there has been an XML output once, but the problem is that it was never
up-to-date / complete enough to do anything useful with it.
Generally, if you want to have a full XML/JSON/whatever output, somebody
would need to implement it, and then make sure it doesn't get out of
date when the implementation changes.
I gather this was the reason something like -dump-xml was dropped as a
CFE action. At the moment, I am working on a toy tool. It is nowhere
close to being production ready.
If someone in the community could mentor me for e.g., GSoC, I could give
this a more serious shot. Ultimately, like you say, this is a continuous
(and not one-time) effort that requires keeping track of upstream
changes in Clang AST. Nonetheless, I offer help for an initial prototype
at the very least.
So far we have build mostly clang-based tools that work directly on the
C++ AST.
Is anyone else in the list considering using clang -ast-dump towards a
non-clang tool e.g., IDE integration or some such?
Thanks for the hint. I was just looking at the getParents() method
Quick question: is getParents(childNode)[0] always the parent (i.e.,
immediate parent, not ancestor) of childNode?
It is always a parent, yes
So far the consensus has been that this would be a very hard project;
there has been an XML output once, but the problem is that it was never
up-to-date / complete enough to do anything useful with it.
Generally, if you want to have a full XML/JSON/whatever output, somebody
would need to implement it, and then make sure it doesn’t get out of
date when the implementation changes.
I gather this was the reason something like -dump-xml was dropped as a
CFE action. At the moment, I am working on a toy tool. It is nowhere
close to being production ready.
If someone in the community could mentor me for e.g., GSoC, I could give
this a more serious shot. Ultimately, like you say, this is a continuous
(and not one-time) effort that requires keeping track of upstream
changes in Clang AST. Nonetheless, I offer help for an initial prototype
at the very least.
So far we have build mostly clang-based tools that work directly on the
C++ AST.
Is anyone else in the list considering using clang -ast-dump towards a
non-clang tool e.g., IDE integration or some such?
Our idea for IDE integration is to export everything that makes sense through libclang (for example, we want to export clang-tidy through libclang, and once we have refactorings, it might make sense to offer them through libclang, too).
Hmm... okay. That's something I hadn't imagined in my head about an AST.
That a node can have multiple parents
Just to be clear, getParents(childNode) returns an array of parents
only, right i.e., one indentation level up in -ast-dump output? E.g., an
entry stmt in functiondecl scope has functiondecl as its parent and
*not* translationunitdecl?
Our idea for IDE integration is to export everything that makes sense
through libclang (for example, we want to export clang-tidy through
libclang, and once we have refactorings, it might make sense to offer
them through libclang, too).
Okay. I did try using libclang python bindings but the AST it exposes
currently is inferior to what one could get with a libtooling based tool.
Is AST provided by libclang *proper* (C binding) better than its python
counterpart? I suppose I mean: Is this a rough ordering from rich to
poor AST detail?
Hmm… okay. That’s something I hadn’t imagined in my head about an AST.
That a node can have multiple parents
Just to be clear, getParents(childNode) returns an array of parents
only, right i.e., one indentation level up in -ast-dump output? E.g., an
entry stmt in functiondecl scope has functiondecl as its parent and not translationunitdecl?
Yes.
Our idea for IDE integration is to export everything that makes sense
through libclang (for example, we want to export clang-tidy through
libclang, and once we have refactorings, it might make sense to offer
them through libclang, too).
Okay. I did try using libclang python bindings but the AST it exposes
currently is inferior to what one could get with a libtooling based tool.
Is AST provided by libclang proper (C binding) better than its python
counterpart? I suppose I mean: Is this a rough ordering from rich to
poor AST detail?
Stand-alone tool > libclang > py libclang
The question is what you want to do; if we’re talking IDE support, we don’t want every IDE to go and fiddle with the clang AST - it’s complex and hard to get right.
Instead, we want to build tools in clang/clang-tools-extra, and export those via libclang (or something similar). That way, all IDEs will have a common set of well-tested tools.
If you want to build your own one-of code transformation, I’d recommend to write it in C++ against the clang AST directly.
The question is what you want to do; if we're talking IDE support, we
don't want every IDE to go and fiddle with the clang AST - it's complex
and hard to get right.
Instead, we want to build tools in clang/clang-tools-extra, and export
those via libclang (or something similar). That way, all IDEs will have
a common set of well-tested tools.
If you want to build your own one-of code transformation, I'd recommend
to write it in C++ against the clang AST directly.
I am working on a close-to-source on-disk representation of clang AST. I
am unsure what level of AST detail is necessary. But I do know that the
detail offered by python bindings is too abstract (low level of detail)
for my needs.
Is the libclang project you speak of (clang-tidy abstraction) in
development or planned for the future? It sounds like an apt setting to
implement Bjarne Stroustrup's coding guidelines [1] that he discusses in
his latest cppcon talk!
The question is what you want to do; if we’re talking IDE support, we
don’t want every IDE to go and fiddle with the clang AST - it’s complex
and hard to get right.
Instead, we want to build tools in clang/clang-tools-extra, and export
those via libclang (or something similar). That way, all IDEs will have
a common set of well-tested tools.
If you want to build your own one-of code transformation, I’d recommend
to write it in C++ against the clang AST directly.
I am working on a close-to-source on-disk representation of clang AST.
I understand that - my question is what you want to use it for.
I
am unsure what level of AST detail is necessary. But I do know that the
detail offered by python bindings is too abstract (low level of detail)
for my needs.
Is the libclang project you speak of (clang-tidy abstraction) in
development or planned for the future? It sounds like an apt setting to
implement Bjarne Stroustrup’s coding guidelines [1] that he discusses in
his latest cppcon talk!
Yes, clang-tidy exists and already has implemented a couple of the core guidelines - contributions to implement more checks from the guidelines would be highly appreciated.
Note that clang-tidy is not exposed through libclang yet.
“atdgen” does not target other languages besides Ocaml (but this could be fixed, or in principle, you could hack the plugin to use another serialization library).
Some C++ constructions are not fully exported yet.
Not a standalone tool yet (this causes some overheads – work in progress).