Future of AST transformation

Hi @clang,

I'm working at my source-to-source transformator based on clang for quite some time now. However, my last clang update constantly causes me some serious headaches. Mainly it is the new expr::EvaluateAsRValue framework, which asserts like hell. Of course rightly since I cannot preserve all - often hidden - AST invariants during transformations.
Some assertions were fixable, but for a lot of others I have no clue.
An example: transforming a+=b to a=a+b works like this in my framework:

CompoundAssignOperator* Node;
// compute binOpc
Expr* firstRhs = Clone_(Node->getLHS());
firstRhs->setValueKind(VK_RValue);
BinaryOperator* TC = Assign_(
   Node->getLHS(),
   BinaryOp_(firstRhs, Node->getRHS(), binOpc));

I had to figure out that I had to set the value kind. But in the end that was still an easy fix. But recently a new problem popped up:

struct A { float x; };
A a;
a.x += y; // assertion: APValue.h:306: isStruct() && "Invalid accessor"

This comes from the fact that "a" in "a.x" itself is treated as an lvalue. And this is only a small example of all the assertion hell I've got with the expr::EvaluateAsRValue framework.
So my question is whether there is or there will be a AST transformation framework capable of transforming expressions like above? Or whether clang/AST transformation itself is a dead end in the long run?

Best regards
Olaf Krzikalla

I think the prevailing opinion is that the Clang AST is not meant to
be changed once created. I may be wrong though.

Off the top of my head, some parts of Clang that you might want to
look at are ASTImporter, the AST serialization/deserialization code,
and lib/Sema/TreeTransform.h.

--Sean Silva

Hi @clang,

I’m working at my source-to-source transformator based on clang for
quite some time now. However, my last clang update constantly causes me
some serious headaches. Mainly it is the new expr::EvaluateAsRValue
framework, which asserts like hell. Of course rightly since I cannot
preserve all - often hidden - AST invariants during transformations.

You should use Sema interfaces to build AST nodes, rather than building them yourself, in order to preserve the AST invariants.

Some assertions were fixable, but for a lot of others I have no clue.
An example: transforming a+=b to a=a+b works like this in my framework:

CompoundAssignOperator* Node;
// compute binOpc
Expr* firstRhs = Clone_(Node->getLHS());
firstRhs->setValueKind(VK_RValue);
BinaryOperator* TC = Assign_(
Node->getLHS(),
BinaryOp_(firstRhs, Node->getRHS(), binOpc));

I had to figure out that I had to set the value kind.

That’s not the right thing to do. You need to create an ImplicitCastExpr of kind IK_LValueToRValue (and potentially perform other parts of the usual arithmetic conversions). Sema::BuildBinOp will handle this for you.

So my question is whether there is or there will be a AST transformation
framework capable of transforming expressions like above?

Take a look at lib/Sema/TreeTransform.h. Using that, it is certainly possible to perform AST to AST transformations in a way which is convenient and maintains the AST invariants.

Or whether
clang/AST transformation itself is a dead end in the long run?

Since you said you’re performing source to source transformations, AST transformation is unlikely to be a good path to follow. A source to source transformation tool should usually make very targeted modifications to the code, and it’s not really practical to deduce what those changes should have been if all you have are ASTs from before and after (think about preserving whitespace, comments, macros, templates, …).

The usual approach for clang-based source-to-source transformation tools is to use the AST to determine what changes should be made, then produce a list of modifications to be made to the original source file. See clang::Rewriter, clang::tooling::Replacement, and clang::tooling::RefactoringTool for some components which make it easier to build such tools.

Since you said you're performing source to source

    > transformations, AST transformation is unlikely to be a
    > good path to follow. A source to source transformation tool
    > should usually make very targeted modifications to the
    > code, and it's not really practical to deduce what those
    > changes should have been if all you have are ASTs from
    > before and after (think about preserving whitespace,
    > comments, macros, templates, ...).

It depends on what you want to do with source-to-source transformation.

If for example I use source-to-source transformations to generate from a
sequential C/C++ program a parallel program with multiple processes
communicating with MPI, with OpenMP #pragma on each SMP node with some
calls to SIMD intrinsics and some parts in CUDA or OpenCL to address
heterogeneous computing, a simple textual transformation is just NOT a
good path to follow.

Another typical use case is high-level hardware synthesis from C/C++.

    > The usual approach for clang-based source-to-source
    > transformation tools is to use the AST to determine what
    > changes should be made, then produce a list of
    > modifications to be made to the original source file. See
    > clang::Rewriter, clang::tooling::Replacement, and
    > clang::tooling::RefactoringTool for some components which
    > make it easier to build such tools.

Yes, but some people are interested by a less "usual approach". :slight_smile:
For serious things, AST transformations are the only way to go.

Keeping track of comments in the AST is a first step, even if
source-to-source transformations are intractable in the general case
because you need to keep the semantics of... comments. I do not even
talk about macros...

Just imagine how to translate
/* ... */
/* ... */ a/* */

Since you said you’re performing source to source
transformations, AST transformation is unlikely to be a
good path to follow. A source to source transformation tool
should usually make very targeted modifications to the
code, and it’s not really practical to deduce what those
changes should have been if all you have are ASTs from
before and after (think about preserving whitespace,
comments, macros, templates, …).

It depends on what you want to do with source-to-source transformation.

If for example I use source-to-source transformations to generate from a
sequential C/C++ program a parallel program with multiple processes
communicating with MPI, with OpenMP #pragma on each SMP node with some
calls to SIMD intrinsics and some parts in CUDA or OpenCL to address
heterogeneous computing, a simple textual transformation is just NOT a
good path to follow.

Another typical use case is high-level hardware synthesis from C/C++.

Right, if you’re generating input to another tool, emitting output from a parsed AST is completely reasonable. My message was mostly concerning tools whose output is supposed to be maintained by a human; apologies for not making that clear.

The usual approach for clang-based source-to-source
transformation tools is to use the AST to determine what
changes should be made, then produce a list of
modifications to be made to the original source file. See
clang::Rewriter, clang::tooling::Replacement, and
clang::tooling::RefactoringTool for some components which
make it easier to build such tools.

Yes, but some people are interested by a less “usual approach”. :slight_smile:
For serious things, AST transformations are the only way to go.

Keeping track of comments in the AST is a first step, even if
source-to-source transformations are intractable in the general case
because you need to keep the semantics of… comments. I do not even
talk about macros…

Just imagine how to translate
/* … /
/
/ a/ /
+=
/
/b / … */
; //…

to
a = a + b;

Where do you want to keep/duplicate the comments ? You have to
“understand” them to resynthesize their new layout and content. :frowning:

Even with simpler cases it is already a nightmare…

Exactly; this is one reason why AST to AST transformations for such cases are problematic. If your output is a list of source transformations, though, it’s easy to produce

/* … /
/
/ a/ /
= a +
/
/b / … */
; //…

or

/* … /
/
/ a/ */

Take a look at lib/Sema/TreeTransform.h. Using that, it is certainly
possible to perform AST to AST transformations in a way which is
convenient and maintains the AST invariants.

I'm aware of that means. However I still haven't figuered out how TreeTransform can be used for a task like "a+=b" -> "a=a+b". Maybe someone has an example. In clang TreeTransform is used for different tasks.

The usual approach for clang-based source-to-source transformation tools
is to use the AST to determine what changes should be made, then produce
a list of modifications to be made to the original source file. See
clang::Rewriter, clang::tooling::Replacement, and
clang::tooling::RefactoringTool for some components which make it easier
to build such tools.

What Ronan said: I use the transformation for SIMD vectorization of loops. The resulting code is still readable but not any longer in a way suitable for maintenance. Mostly it is sent to a compiler straightforward.

I still think that an universal AST transformation framework would offer some great opportunities. I do it for years now and most of the time all went OK, esp. on statement level. All the hassle starts on expression level (and mostly due to ImplicitCastExpr).

Best regards
Olaf Krzikalla

I'm aware of that means. However I still haven't figuered out how
TreeTransform can be used for a task like "a+=b" -> "a=a+b". Maybe
someone has an example. In clang TreeTransform is used for different tasks.

Check out the block comment right above the class declaration, it
contains a quite thorough discussion. TreeTransform is just a standard
visitor pattern which recursively accumulates the results of
subinvocations. If you're familiar with Pythons `ast.NodeTransformer`,
TreeTransform is basically the same idea.

--Sean Silva

There are at least two reasons against TreeTransform: firstly, it is not even exposed as an API but just a part of the sema lib. Secondly, even if it would be available, I won't be able to construct an actual object, since I don't have a proper sema object. For various reasons I have to construct the complete AST before I start transforming it. And then I don't have an idea, how I get sema back to the appropriate state.

For the moment I'm going to implement my own simple EvaluateAsRValue function. Of course I'm very unhappy with this solution right now.

Best regards
Olaf Krzikalla

Take a look at lib/Sema/TreeTransform.h. Using that, it is certainly
possible to perform AST to AST transformations in a way which is
convenient and maintains the AST invariants.

I'm aware of that means. However I still haven't figuered out how
TreeTransform can be used for a task like "a+=b" -> "a=a+b". Maybe
someone has an example. In clang TreeTransform is used for different tasks.

The usual approach for clang-based source-to-source transformation tools
is to use the AST to determine what changes should be made, then produce
a list of modifications to be made to the original source file. See
clang::Rewriter, clang::tooling::Replacement, and
clang::tooling::RefactoringTool for some components which make it easier
to build such tools.

What Ronan said: I use the transformation for SIMD vectorization of
loops. The resulting code is still readable but not any longer in a way
suitable for maintenance. Mostly it is sent to a compiler straightforward.

I'm probably missing something, but my gut feeling would be that this
should be rather done as an llvm pass... Any specific reasons that
speak against this?

Cheers,
/Manuel

What Ronan said: I use the transformation for SIMD vectorization of
loops. The resulting code is still readable but not any longer in a way
suitable for maintenance. Mostly it is sent to a compiler straightforward.

I'm probably missing something, but my gut feeling would be that this
should be rather done as an llvm pass... Any specific reasons that
speak against this?

It needs to be source-to-source. The output is sent to other compilers. The llvm-to-c code is not really suitable for this kind of tasks.

Best regards
Olaf Krzikalla

Readable output code. For example something you can work later on, with
a tool or a... human being. :slight_smile:

You can easily get the Sema from the CompilerInstance with
CompilerInstance::getSema().

--Sean Silva

Check out the block comment right above the class declaration, it
contains a quite thorough discussion. TreeTransform is just a standard
visitor pattern which recursively accumulates the results of
subinvocations. If you’re familiar with Pythons ast.NodeTransformer,
TreeTransform is basically the same idea.

There are at least two reasons against TreeTransform: firstly, it is not
even exposed as an API but just a part of the sema lib.

Yes, that’s unfortunate for your use case, but you should at least be able to call the relevant Sema::Build* methods directly.

Secondly, even
if it would be available, I won’t be able to construct an actual object,
since I don’t have a proper sema object. For various reasons I have to
construct the complete AST before I start transforming it. And then I
don’t have an idea, how I get sema back to the appropriate state.

The way to deal with this is to perform your transformation from the ASTConsumer::HandleTranslationUnit callback. At that point, the entire AST has been formed, but the Sema object is still alive and ready to do stuff.

For the moment I’m going to implement my own simple EvaluateAsRValue
function. Of course I’m very unhappy with this solution right now.

If you want to evaluate ASTs which do not meet the invariants, you will have to do this. There are also no guarantees that AST printing, or any other AST functionality (including creating the AST nodes in the first place), will work on invariant-violating ASTs.

– Richard

Hi,

> The way to deal with this is to perform your transformation from the
>ASTConsumer::HandleTranslationUnit callback. At that point, the entire
>AST has been formed, but the Sema object is still alive and ready to >do stuff.

I am jumping in here, as I am trying to do this exactly. I would now like to delete a global variable on the AST(to which I have a VarDecl) so it is gone when printing the AST later on. I understand it is not really a transformation, but any easy way to do this?

Best,
Max

Hi @clang,

> Yes, that's unfortunate for your use case, but you should at least be
> able to call the relevant Sema::Build* methods directly.
Below is some apparently working code:

// a += b -> a = a + b
struct CompoundAssignTransform : TreeTransform<CompoundAssignTransform>
{
   CompoundAssignTransform (Sema& s) :
     TreeTransform<CompoundAssignTransform>(s) {}

   bool AlwaysRebuild() { return true; } // needed for cloning the lhs

   ExprResult TransformCompoundAssignOperator(CompoundAssignOperator *E)
   {
     BinaryOperator::Opcode binOpc = computeOpc(E); // a switch-case

     ExprResult lhsClone = TransformExpr(E->getLHS());
     if (lhsClone.isInvalid()) return lhsClone;

     ExprResult rhs = RebuildBinaryOperator(
       E->getOperatorLoc(), binOpc, lhsClone.get(), E->getRHS());
     if (rhs.isInvalid()) return rhs;

     return RebuildBinaryOperator(
       E->getOperatorLoc(), BO_Assign, E->getLHS(), rhs.get());
   }
};

Is this the way to go? Esp. the part cloning the lhs via TransformExpr seems a littly bit sketchy to me. However it works with AlwaysRebuild == true. And the cloning stuff is the only used portion of TreeTransform. The remains are redirected calls to Sema. So I could go without TreeTransform, if I have another cloning means. But maybe this is indeed how TreeTransform is intended to use. In this case I strongly vote for a move of TreeTransform.h to the include directory :slight_smile:

Best regards
Olaf Krzikalla

Hi,

any idea on this one?

Thx,
Max