C AST transformations / questionable use of AST serialization

Hi,

We've been building a tool (eventually to be released BSD) for allowing
programmers to write custom program properties (complexity, semantic,
architectural, etc.) in a high level DSL (embedded in Haskell at the moment).
We switched from a decent but ad hoc C99 parser to using the Clang front end and
are very happy customers. We are using the libclang C interface via FFI.

However, we lost one extremely useful capability in this transition. We had
some really nice one-liners in our pre-clang days, e.g.,

  property1 = noUnreachableCode . removeDecls (hasPrefix "test_")

    // Remove all the declarations for test code from the project then
    // test to see if there is no unreachable code

  property2 = noUnreachableCode
            . removeDecls (hasPrefix "test_")
            . removeMembersFromStructs (hasPrefix "test_")

    // Ditto, but we also remove structure members that are only there
    // for testing purposes.

If you don't grok Haskell:
  - The '.' above is function composition (like '|' in Unix)
  - removeDecls, removeMembersFromStructs, hasPrefix are higher order functions.

With our switch to clang, we have lost the ability to do quick and easy
wholesale project transformations like the above removeDecls function. We also
have the need to do transformations that add to the code (e.g., inserting
attributes). The output of these transformations (code slicing, mutations, extensions) may be
only be for intermediate use and are not necessarily output for the sake of code refactoring.

I'd really like to regain the ability to achieve such transformations. As we explore
ways to do this, these are some of my thoughts:

  - These modules

       Refactoring.h - Framework for clang refactoring tools
       Rewriter.h - Code rewriting interface

    seem to be designed for applying changes to the source and cannot
    be readily used to modify the AST (nor the serialized form of the AST).

    Correct?

  - One approach I'm considering is to write a custom encoder/decoder for the
    serialized AST for our Haskell code. I.e., porting the clang::serialization
    stuff to Haskell so that we can read and write .ast files.

    I saw some long past post to this list that discouraged this.
    But my question is not so much whether you think (as C++ coders) this is the *preferable* way,
    but

      IF someone is really keen for a 3rd party (non C++) tool to transform the AST

       - Is the above replace-serialization approach even feasible?
       - Any warnings/suggestions if we did try this?
       - Are there alternative ways to do this that don't involve applying
         rewrites to the source and re-parsing?

Sorry for the long post. Any insights or guidance would be very helpful!

- Mark Tullsen

Hi,

We've been building a tool (eventually to be released BSD) for allowing
programmers to write custom program properties (complexity, semantic,
architectural, etc.) in a high level DSL (embedded in Haskell at the
moment).
We switched from a decent but ad hoc C99 parser to using the Clang front
end and
are very happy customers. We are using the libclang C interface via FFI.

However, we lost one extremely useful capability in this transition. We
had
some really nice one-liners in our pre-clang days, e.g.,

  property1 = noUnreachableCode . removeDecls (hasPrefix "test_")

    // Remove all the declarations for test code from the project then
    // test to see if there is no unreachable code

  property2 = noUnreachableCode
            . removeDecls (hasPrefix "test_")
            . removeMembersFromStructs (hasPrefix "test_")

    // Ditto, but we also remove structure members that are only there
    // for testing purposes.

If you don't grok Haskell:
  - The '.' above is function composition (like '|' in Unix)
  - removeDecls, removeMembersFromStructs, hasPrefix are higher order
functions.

With our switch to clang, we have lost the ability to do quick and easy
wholesale project transformations like the above removeDecls function. We
also
have the need to do transformations that add to the code (e.g., inserting
attributes). The output of these transformations (code slicing,
mutations, extensions) may be
only be for intermediate use and are not necessarily output for the sake
of code refactoring.

I'd really like to regain the ability to achieve such transformations. As
we explore
ways to do this, these are some of my thoughts:

  - These modules

       Refactoring.h - Framework for clang refactoring tools
       Rewriter.h - Code rewriting interface

    seem to be designed for applying changes to the source and cannot
    be readily used to modify the AST (nor the serialized form of the AST).

    Correct?

Yes.

  - One approach I'm considering is to write a custom encoder/decoder for
the
    serialized AST for our Haskell code. I.e., porting the
clang::serialization
    stuff to Haskell so that we can read and write .ast files.

    I saw some long past post to this list that discouraged this.
    But my question is not so much whether you think (as C++ coders) this
is the *preferable* way,
    but

      IF someone is really keen for a 3rd party (non C++) tool to
transform the AST

       - Is the above replace-serialization approach even feasible?

I think it's feasible, but see below :wink:

       - Any warnings/suggestions if we did try this?

- the AST is huge and changes somewhat frequently (not so much in itself,
but new AST nodes are introduced, etc); this might not be a big problem for
you if you only care about C, but it might lead to non-trivial maintenance
effort for the tool
- the AST invariants are hard to get right

In the end it's of course a cost-benefit trade-off. My best guess is that
it's usually not worth to try to maintain an adapted out-of-tree
serialization framework for clang's AST, but YMMV.

       - Are there alternative ways to do this that don't involve applying
         rewrites to the source and re-parsing?

I'm not aware.

Hi Mark,