AST access

Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii

Hi Umar,

Right now the GCC "TREE" stuff is a bit ugly in that llvm-gcc uses its
own C based interface to LLVM. While its *possible* to download the
source tarball (or CVS) for llvm-gcc and hack away,I doubt you'd find
that a particularly rewarding/fun/easy exercise. So, short term, I'd
have to say the answer is "not unless you're a masochist".

Long term (next year), however, I expect to be augmenting LLVM with a
front end compiler writer's toolkit. The toolkit will have the following
      * be a "toolkit" not a "solution" - the goal is to provide
        reusable components from which a front end language translator
        could be constructed - the goal is NOT to provide an
        end-all-be-all "union" of everything needed by all languages
      * completely ignore lexical scanning and basic parsing (there
        might be some portions of the toolkit that assist with writing
        production rules (especially at the edge of the AST).
      * source language agnostic - not even tied to a particular style
        of programming - it should be able to provide some support for
        diverse languages such as Haskell, Prolog, ML, LISP, Scheme, C,
        C++, Pascal, Smalltalk, Objective-C, FORTRAN, COBOL,etc.
      * written in C++ (possibly with a C interface)
      * well integrated with LLVM C++ interface - i.e. make it simple to
        produce an LLVM IR (Module) from an AST and deal with that
        Module in terms of the optimization passes, code generation,
      * provides portions of a flexible abstract syntax tree (i.e. its a
        basic multi-way tree that makes very few assumptions about which
        nodes go where. The idea is to deal with the tree fundamentals
        and provide a toolkit, not a total solution, so that source
        language writers can build their language's AST trivially
      * provide some of the more common and redundant pieces such as
        arithmetic expressions for "C" type arithmetic
      * perhaps, one day, provide a common object model for
        object-oriented languages (an intersection (not union!) of
        what's needed for Java, C++, C#, Smalltalk, Objective-C)
      * handle some common runtime objects like "string" (Pascal, C++,
        C, and Java all do this differently but perhaps there's a way to
        provide a fundamental string runtime that they can use and
      * make the generation of debug information into the code trivial
      * possibly lots of other things too.

If you have ideas in this area, I'd love to hear them. I'm still in the
phase of requirements gathering on this project so its definitely not
too late to make suggestions :slight_smile:

As to your second question about regenerating source code from the AST,
this is non-trivial in most situations. You can often generate
*equivalent* source code, but not often generate the *exact* source, at
least not without lots of support for it built into the AST. Why do you
need this? Reverse engineering?