I am trying to find a tool that will parse and tell me information about a C source file in a verbose manner which I can digest and use in a script I am writing. The closest I have come so far using gcc to generate debug info and then extract the debug info and parse that. However, what I’d prefer is a tool which lets me get at similar debug info (although obviously not memory offsets etc) without having to compile the C source code into object files. I know that clang has been re-written from scratch to be a lot more flexible than gcc. Does anybody in this list know how I could use clang (or another tool) to obtain such debug info without compiling all the way? Thanks, Simon
I am trying to find a tool that will parse and tell me
> information about a C source file in a verbose manner which I
> can digest and use in a script I am writing. The closest I
> have come so far using gcc to generate debug info and then
> extract the debug info and parse that.
That is not a bad solution if it works for you, because you can use only
some standard tools already available on any systems, such as objdump or
gdb with a script...
> However, what I'd prefer is a tool which lets me get at
> similar debug info (although obviously not memory offsets
> etc) *without* having to compile the C source code into
> object files. I know that clang has been re-written from
> scratch to be a lot more flexible than gcc. Does anybody in
> this list know how I could use clang (or another tool) to
> obtain such debug info without compiling all the way?
In the "other tools" family, gcc-xml can output the AST in an XML way.
In our Par4All compiler, we use another tool internally, PIPS, that
allows to output the AST in HTML or in a textual format.
There is a web service you can use to have an idea of what it can output:
I am trying to find a tool that will parse and tell me
information about a C source file in a verbose manner which I
can digest and use in a script I am writing. The closest I
have come so far using gcc to generate debug info and then
extract the debug info and parse that.That is not a bad solution if it works for you, because you can use only
some standard tools already available on any systems, such as objdump or
gdb with a script…
It’s the solution that gets closest to want I want to achieve so far. But it’s not pretty It also doesn’t contain the whitespace and comment data (see below) which would be useful.
However, what I’d prefer is a tool which lets me get at
similar debug info (although obviously not memory offsets
etc) without having to compile the C source code into
object files. I know that clang has been re-written from
scratch to be a lot more flexible than gcc. Does anybody in
this list know how I could use clang (or another tool) to
obtain such debug info without compiling all the way?In the “other tools” family, gcc-xml can output the AST in an XML way.
http://www.gccxml.org/HTML/Index.html
I really like the idea of this project. Too bad it’s C++ only, unfinished (doesn’t process function bodies), and all but dead.
In our Par4All compiler, we use another tool internally, PIPS, that
allows to output the AST in HTML or in a textual format.There is a web service you can use to have an idea of what it can output:
http://pips4u.org/doc/ir-navigator
I tried it out. Very nice web interface. Close to what I want except for two things:
- Is it possible to output file line number and character position so that it is also part of the tree?
- Is it possible for the tree to contain info relating to whitespace and comments?
Ideally I’d like to be able to use a tool like PIPS to do so-called round trip parsing where the original source code can be rebuilt exactly from the intermediate representation. Somewhat like what this Perl module achieves:
http://search.cpan.org/~adamk/PPI-1.215/lib/PPI.pm
Thanks,
Simon
Thanks for the link, Ehsan. Looks like an interesting example of programmatically dumping the clang AST. I’ll take a look. – Simon
In our Par4All compiler, we use another tool internally, PIPS,
>> that allows to output the AST in HTML or in a textual format.
>> There is a web service you can use to have an idea of what it can
>> output: IR navigator — PIPS Project
> I tried it out. Very nice web interface. Close to what I want
> except for two things: 1. Is it possible to output file line
> number and character position so that it is also part of the
> tree? 2. Is it possible for the tree to contain info
> relating to whitespace and comments?
Thanks for asking! I've just realized that indeed the full internal
representation of a statement is not displayed on this WWW interface.
For example the line number (called "number" indeed in PIPS AST) and the
comments are not displayed. We should not trust our PhD students...
We keep spacing information by storing them in the comments indeed.
A comment is anything around (spaces, // or /*, \n...), not only the
comments by themselves, to capture most of the syntactic context.
But anyway, the character position is lost. But of course it could be
stored...
(The curious people can look at
http://www.cri.ensmp.fr/pips/newgen/ri.htdoc/#x1-270003.4
on what we keep in statements)
> Ideally I'd like to be able to use a tool like PIPS to do
> so-called round trip parsing where the original source code
> can be rebuilt exactly from the intermediate
> representation.
It is difficult to achieve this with tools that have been conceived to
do source-to-source transformation and have a compact canonical internal
representation. Often some information is lost in translation.
For example, in PIPS, with an common internal representation for C or
Fortran, there is some loss of information. You can parse in Fortran
and prettyprint in C for example. Sounds crazy but useful to
generate CUDA or OpenCL...
Often there are many different ways to express the same thing (for
example in Fortran declarations) so, in analysis tools, it is useless to
keep this and you can have a canonical internal representation. But
users can be disappointed with a simple prettyprint not producing the
same text as the input.
There was also a recent discussion on the ROSE compiler (another tool
you could look at) mailing list on this issue.
If you want to keep preprocessing into account + program
transformations: undecidable to get back a sensible source in the
general case...
Long time ago, I saw a tool to do Y2K refactoring with 2 internal
representations : 1 canonical for the abstract interpretation and a
concrete one for the syntactic details to be served back to the user.
But anyway, if it is just for parsing, the Clang parser should keep you
the information you want by adding some hooks...