GSoC AST->XML project still open?

Hi,

Is the XML Representation of ASTs GSoC project still open? I'm very interested in the project, but I have only taken one compilers class at university.

Thanks.

Hi,
   It seems that the project is listed at the clang's open projects (http://clang.llvm.org/OpenProjects.html). You should follow the guidelines by Anton: http://lists.llvm.org/pipermail/cfe-dev/2016-March/047919.html
--Vassil

Hi,

I have a few questions about the AST->XML project as listed on the Open projects page:

  1. Is my understanding correct that the purpose of the project is to produce 2 things: a Schema (an XML schema? as in a XSD file?) and a XML formatter (does this need to be written in a particular language? does it need to be compiled?). The XML formatter will be either a part of Clang, or a Clang plugin, or makes use of the Clang C API LibClang, or otherwise, that takes as input Clang’s internal AST of a piece of source code (in C/C++/Obj C) and outputs a XML representation of the Clang-produced AST that a. Does not change (what doesn’t change? the exact output or the schema that the output conforms to or something else?) when Clang changes (this is how I interpreted “stable across Clang versions”). b. Is able to represent the different languages (C/C++/Obj C) “abstractly and not tied to the internal ASTs that Clang uses” (how? do we simply translate the AST to XML, ensuring that it conforms to the Schema, or do we need to further transform the AST to another more “abstract” form before/after converting it to XML?) and c. Produces output that can be verified against the Schema (is verification here the same as validation? like, XML validation using Xerces?).

  2. Is the project still required? I understand that having a detailed dump of the AST is useful for compilers (e.g code generation from the AST?) and static analysis tools (e.g control flow graph analysis - unreachable code?). Is clang’s current -ast-dump option good enough for compilers and static analysis tools, removing the need for the project? Or is there some alternative Clang-AST-dump tool that is currently used that obviates the need for this project? Is it still the case that compilers and static analysis tools which wish to make use of the Clang-produced AST must either be a Clang-plugin or use the LibClang API (the C interface to Clang) which is tied to the Clang binary? I found an incomplete project that attempts to translate the Clang AST to XML (https://github.com/BentleyJOakes/PCX). There are also tools that produce XML representations of ASTs such as DMS Software Reengineering Toolkit which do not use Clang. Why are these not acceptable?

  3. Regarding the -ast-dump-xml option that appears to have been removed sometime in 2013 (“I removed an XML printer that was underdeveloped and unmaintained. Now that the normal AST dumping has been greatly improved, it’s probably time to remove the XMLish dump as well.”). Is this project related to that in any way?

  4. Is my understanding correct that the schema must be general enough to encompass all possible future changes to the C, C++ and Objective C language specifications? Like, if a new construct is added to the C++ language (let’s say for example, try-catch-finally), should the schema be able to anticipate all such changes? Let’s take a simple hello world program in C++ test.cpp:

#include<stdio.h>
int main()
{
printf(“Hello World”);
return 0;
}

If I run clang -Xclang -ast-dump -fsyntax-only test.cpp I get (removed some of the output):

FunctionDecl main
`-CompoundStmt

-CallExpr

-ImplicitCastExpr
-DeclRefExpr printf -ImplicitCastExpr
-StringLiteral "Hello World" -ReturnStmt
`-IntegerLiteral 0

What should the XML output look like? Here’s how I imagine it might look:

What should the schema look like? Here’s what I imagine part of the XML schema might look like:

<xsd:complexType name=“CompoundStmt”>
xsd:sequence
<xsd:element name=“CallExpr” minOccurs=“0” maxOccurs=“unbounded”>

If my basic understanding is correct, that means we need to know every construct in the language (such as “CallExpr”) before we can write the schema, and so if a new construct is introduced (or some other change is made to the language specification) that means we would have to edit the schema? Am I completely misunderstanding what a XML schema is?

  1. I have only taken an undergrad-level compilers course at university and have not worked with the Clang AST or XML in any significant capacity. Do I have enough expertise to create the schema or the XML formatter?

  2. Will I need to maintain the project afterwards?

  3. Is there a mentor for this project?

Thanks again and sorry for asking so many questions.

I have a few questions about the AST->XML project as listed on the Open
projects page:

Just random thought - probably you should not limit yourself to XML.
Some of LLVM's subprojects are pretty happy with YAML.

FreeBSD is slowly moving utilities over to using libx, which allows human-friendly text, XML, and JSON all to be generated from the same source code. If anyone is thinking of adding more machine-consumable output, then it would probably be worth investigating.

David

Sorry, typo: that should have been libxo. It’s on GitHub here:

https://github.com/Juniper/libxo

Documented here:

http://juniper.github.io/libxo/libxo-manual.html

David