about AST of clang

Hi Clang list,

I noticed there is an AST file format used for higher-level
representations of C codes under Clang.

What I want to do is make the parse trees of C programs, and then the
trees can be edited, e.g., insert new node, modify existing nodes,
etc. Finally, I want to compile these modified trees into executables.

Is it possible? Moreso, is it practical? I'd like to know where can
I find such examples or supporting features from clang.

Thanks for your patience and your thoughts,

-- Zhe

I have the same question.

The Clang AST is not designed to be edited. It can be done, but ensuring that the resulting AST is well-formed is a non-trivial task.

  - Doug

And to propose an alternative solution (which might of course not
apply to the situation at hand): we've had great success with
identifying nodes in the AST that we want to change, mapping it to
source code, doing text transformations on the source code, and then
compiling the resulting source code back into an AST / an executable.

Cheers,
/Manuel

Because of a lack of resources, we did this more than 15 years ago in
another compiler we work on and it is still there. Now I really
understand in the real life this concept:

:wink:

So, yes it works on some simple cases... But it is a nightmare to have
to apply patches on a kludge when you have to improve the parser or
change the internal representation only a little bit or to deal with
more general cases.

That is why we are thinking to add support to Clang to have a
cleaner way to have source-to-source transformations that preserve
comments and so on.

>> The Clang AST is not designed to be edited. It can be done, but
>> ensuring that the resulting AST is well-formed is a non-trivial
>> task.

> And to propose an alternative solution (which might of
> course not apply to the situation at hand): we've had great
> success with identifying nodes in the AST that we want to
> change, mapping it to source code, doing text
> transformations on the source code, and then compiling the
> resulting source code back into an AST / an executable.

Because of a lack of resources, we did this more than 15 years ago in
another compiler we work on and it is still there. Now I really
understand in the real life this concept:
Technical debt - Wikipedia
:wink:

Well, I disagree that it's a poor architecture, which would not make
it technical debt :wink: Instead, I suggest it's a hard problem and it
will take a decent amount of maintenance effort no matter what you do.

So, yes it works on some simple cases... But it is a nightmare to have
to apply patches on a kludge when you have to improve the parser or
change the internal representation only a little bit or to deal with
more general cases.

As I said before, I might not understand what you're trying to do.
That said, I think that changing the code is a superset of changing
the AST, so I don't understand why it's harder to do certain things
that way. It of course requires a very precise mapping of AST nodes to
source code, which clang luckily enough has. It also requires that you
still want to do C++ (if you don't, that would be a case where what I
say clarly does not apply).

As for changes of internal representation getting in your way - how's
that better when you directly work on the AST - on the contrary, I
expect subtle changes to the invariants of the AST to make it much
harder to still produce correct ASTs by shoving around AST nodes,
instead of making textual changes.

And third, if you ever want the changes to go back to the programmer
in code form, you suddenly need to care about formatting etc, and
minimally disruptive changes to the text.

That is why we are thinking to add support to Clang to have a
cleaner way to have source-to-source transformations that preserve
comments and so on.

I don't understand yet why you think the direct tree transformation is
per se cleaner.

Cheers,
/Manuel

If I want to do some instrumentation on C source codes, i.e. insert
some statements into the codes, I think that the best way may be the
following one: analysis the generated AST, find appropriate insert
points, and then directly modify the source code using
source-to-source transformations which preserve comments and so on,
since there is a perfect mapping between AST nodes and the source
code.

And, according to the above discussions, directly modifying AST nodes
is not practical, due to the consistency problem of the modified AST.

Is this conclusion correct? Any other opinion?

Yes, this is a good way to go.

An alternative, if you're just doing instrumentation, is to extend Clang's IR generation and simply introduce the instrumentation calls into the IR directly. It makes it easier to compile code with instrumentation (just by using your modified compiler) but takes away the ability to compile the instrumented code with some other compiler.

  - Doug

If I want to do some instrumentation on C source codes, i.e. insert
some statements into the codes, I think that the best way may be the
following one: analysis the generated AST, find appropriate insert
points, and then directly modify the source code using
source-to-source transformations which preserve comments and so on,
since there is a perfect mapping between AST nodes and the source
code.

And, according to the above discussions, directly modifying AST nodes
is not practical, due to the consistency problem of the modified AST.

Is this conclusion correct? Any other opinion?

Yes, this is a good way to go.

An alternative, if you're just doing instrumentation, is to extend Clang's IR generation and simply introduce the instrumentation calls into the IR directly. It makes it easier to compile code with instrumentation (just by using your modified compiler) but takes away the ability to compile the instrumented code with some other compiler.

   \- Doug

Thanks!
Concerning the source-to-source transformation, the "Rewriter" class
is the best way of implementation?
In fact, I read the header file, but I found that the class is not
associated with any string of file name.
I mean, how can I modify the source code using the Rewriter class?
Is there any example?

See [cfe-commits] [PATCH] X-TU Refactoring support
for a patch that's currently under review that contains an example on
how to implement source-to-source translations crossing TU boundaries
using the rewriter.

Cheers,
/Manuel

As I said before, I might not understand what you're trying

    > to do. That said, I think that changing the code is a
    > superset of changing the AST, so I don't understand why it's
    > harder to do certain things that way. It of course requires
    > a very precise mapping of AST nodes to source code, which
    > clang luckily enough has. It also requires that you still
    > want to do C++ (if you don't, that would be a case where
    > what I say clarly does not apply).

My argumentation is that if the Clang front-end is used more and more as
a generic source-to-source translator, at some point we will need some
features helping the programmers to do so.

For example, just think to generating code for some kind of
heterogeneous accelerators, for example GPU. You need to outline some
pieces of code to new functions and it is quite complicated to do it in
the general case at the source level. Even at the AST level with more
suitable abstractions it is already difficult, as we can see in other
source-to-source compilers (as with ROSE Compiler & PIPS I know)

    > As for changes of internal representation getting in your
    > way - how's that better when you directly work on the AST -
    > on the contrary, I expect subtle changes to the invariants
    > of the AST to make it much harder to still produce correct
    > ASTs by shoving around AST nodes, instead of making textual
    > changes.

That is true. But for quite complex transformation work, I'm not sure
there is another way...

    > And third, if you ever want the changes to go back to the
    > programmer in code form, you suddenly need to care about
    > formatting etc, and minimally disruptive changes to the
    > text.

Of course, I assume in the "higher level" support that we need, we keep
information on all the formatting stuff. :slight_smile:

But anyway, it is an intractable issue per se (transformation on code
with a macro expansion, what is the very semantics of a comment
envisioned by the programmer at some position in the code...), but we can
provide some support for simple cases.

    > I don't understand yet why you think the direct tree
    > transformation is per se cleaner.

Because at some point of transformation complexity, it is cleaner to
invest in some common high-end source-to-source transformation support
and rely on it to develop all the complex tools, rather than trying to
do too complex stuff with string transformations.

But it is related to what we expect to do with the tool and it may not
apply to you. I understand it may not be the mainstream use for
Clang. :slight_smile: For some of our use cases (not using Clang), have a look to
par4all.org for example.