AST Transformation

Hi,

I want to implement a C obfucation tools that makes AST transformations.
In output I want C source or an executable it's not important. I'm new
in clang and I don't understand the complete architecture but I've got a
clear idea.

I want to use clang to build the AST. Then I will make AST
transformations with a plugin. Finally, using clang/LLVM to build the
executable or rewrite source code. I have seen the plugin example
"PrintFunctionName" and I like this principle.

I just want to know if, some of you as clang/LLVM expert, think that
it's possible or not and if it's the good or the wrong way to do AST
transformation.

I really want to start this project and I'm taking all your comments and
suggestions !

Thank you in advance for your help !

Greg

There's no point in obfuscating C code if you're going to output an executable. The optimizers are effectively obfuscators, and they make your code run faster while they're at it, too.

Sebastian

Thank you for you reply

I want to make AST transformation. My goal is to change the program control flow. One example is controle flow flattening. http://www.inf.u-szeged.hu/~akiss/pub/pdf/laszlo_obfuscating_journal.pdf. But there are many other techniques.
I think that if I can directly build an executable from the modified AST it's interesting. And rebuild the source code also interest me. I want to make harder the reverse engineering.

someone else is he convinced by the idea or not ?

But the question is more technical about Clang. Can we just implement AST transformation plugin and let the normal execution of Clang to build the executable ? I've read that pretty-print do not garantee to get reparsable code, therefore I throught that it will be better to build directly the executable.

If you don't particularly care about getting source code back, I would suggest doing this as a transformation on LLVM IR rather than on Clang's ASTs.

John.

I'd like to join this part of question (but not in obfuscation aspect).

AFAIU, plugins like PrintFunctionNames can only replace Clang AST
transformation, but not add one to compilation process.

More specifically, I'm trying to create C++ frontend explicitly supporting
features of Qt Framework. Goal is to provide additional compile-time
diagnostics and optimizations. To achieve it, I'd like to interact with AST
when it is constructed (via custom ASTConsumer or maybe PPCallbacks,
or both), then perform some transformations of this AST, and return it back
into compilation process.

It would be great if it could be done in a plugin, because otherwise I'm afraid
that I'll need to fork Driver and introduce substantial modifications to it's code.

Any suggestions?

But the question is more technical about Clang. Can we just implement AST transformation plugin and let the normal execution of Clang to build the executable ? I've read that pretty-print do not garantee to get reparsable code, therefore I throught that it will be better to build directly the executable.

I'd like to join this part of question (but not in obfuscation aspect).

AFAIU, plugins like PrintFunctionNames can only replace Clang AST
transformation, but not add one to compilation process.

You can use -add-plugin to run plugins in addition to codegen, but the
plugins will run _after_ codegen, because the AST is currently
designed to be immutable.

More specifically, I'm trying to create C++ frontend explicitly supporting
features of Qt Framework. Goal is to provide additional compile-time
diagnostics and optimizations. To achieve it, I'd like to interact with AST
when it is constructed (via custom ASTConsumer or maybe PPCallbacks,
or both), then perform some transformations of this AST, and return it back
into compilation process.

That's currently not possible.

Did you mean "it's impossible to transform AST" or just "it's impossible to hook in
before CodeGen"?

31.03.2011, 11:24, "Ruch Grégory" <gregory.ruch@heig-vd.ch>;:

But the question is more technical about Clang. Can we just implement AST transformation plugin and let the normal execution of Clang to build the executable ? I've read that pretty-print do not garantee to get reparsable code, therefore I throught that it will be better to build directly the executable.

I'd like to join this part of question (but not in obfuscation aspect).

AFAIU, plugins like PrintFunctionNames can only replace Clang AST
transformation, but not add one to compilation process.

You can use -add-plugin to run plugins in addition to codegen, but the
plugins will run _after_ codegen, because the AST is currently
designed to be immutable.

More specifically, I'm trying to create C++ frontend explicitly supporting
features of Qt Framework. Goal is to provide additional compile-time
diagnostics and optimizations. To achieve it, I'd like to interact with AST
when it is constructed (via custom ASTConsumer or maybe PPCallbacks,
or both), then perform some transformations of this AST, and return it back
into compilation process.

That's currently not possible.

Did you mean "it's impossible to transform AST" or just "it's impossible to hook in
before CodeGen"?

The AST is designed to be immutable after construction, so "it's
impossible to transform AST".

Thank for your reply.

31.03.2011, 11:24, "Ruch Grégory" <gregory.ruch@heig-vd.ch>;:

But the question is more technical about Clang. Can we just implement AST transformation plugin and let the normal execution of Clang to build the executable ? I've read that pretty-print do not garantee to get reparsable code, therefore I throught that it will be better to build directly the executable.

I'd like to join this part of question (but not in obfuscation aspect).

AFAIU, plugins like PrintFunctionNames can only replace Clang AST
transformation, but not add one to compilation process.

You can use -add-plugin to run plugins in addition to codegen, but the
plugins will run _after_ codegen, because the AST is currently
designed to be immutable.

Have you got an example that explain how to use -add-plugin. I'm trying
to use PrintFunctionNames example unsuccessful.

More specifically, I'm trying to create C++ frontend explicitly supporting
features of Qt Framework. Goal is to provide additional compile-time
diagnostics and optimizations. To achieve it, I'd like to interact with AST
when it is constructed (via custom ASTConsumer or maybe PPCallbacks,
or both), then perform some transformations of this AST, and return it back
into compilation process.

That's currently not possible.

Did you mean "it's impossible to transform AST" or just "it's impossible to hook in
before CodeGen"?

The AST is designed to be immutable after construction, so "it's
impossible to transform AST".

With this post and other we have found interest about AST
transformation. Could it be a future feature ?

Actually there are -add-plugin that run after codeGen but could it be
possible to add the possibility to run it before and let the programmer
assume to make correct transformations?

Best regards
Greg

Note that this doesn't mean that you can't walk the AST and then construct a new AST that is based on the old one but in some way different.

A good project for someone interested in AST transforms would be a subclass of RecursiveASTVisitor that generated a copy of the AST by walking each node in turn and adding an equivalent node to a new AST. Users could then subclass this and use it to generate a new AST that differed from the original in only the specific ways that they desired.

This would have the same effect as modifying an AST (albeit with a higher memory overhead), but would not impact any of the AST consumers that expect the AST to be immutable (i.e. all of the ones that currently exist, including CodeGen).

David

The AST is designed to be immutable after construction, so "it's
impossible to transform AST".

Note that this doesn't mean that you can't walk the AST and then construct a new AST that is based on the old one but in some way different.

A good project for someone interested in AST transforms would be a subclass of RecursiveASTVisitor that generated a copy of the AST by walking each node in turn and adding an equivalent node to a new AST. Users could then subclass this and use it to generate a new AST that differed from the original in only the specific ways that they desired.

I think this is basically the functionality of TreeTransform, any effort should be directed at improving that one IMO.

Thank you for explanation. However, isn't it possible to chain ASTConsumers?

E.g. we do all transformations in our ASTConsumer, than walk again through the tree and
call HandleTopLevelDecl, HandleTagDeclDefinition, HandleTranslationUnit, etc. of other
ASTConsumers just like AST is being created this moment.

Also, if I understand properly, there's another way to effectively "transform" AST - inherit from
Sema and change the way nodes are created (however it may not be enough information on
this step)

I'd not noticed TreeTransform. Is there a reason why its header is not public?

David

A good project for someone interested in AST transforms would be a subclass of RecursiveASTVisitor that generated a copy of the AST by walking each node in turn and adding an equivalent node to a new AST. Users could then subclass this and use it to generate a new AST that differed from the original in only the specific ways that they desired.

I think this is basically the functionality of TreeTransform, any effort should be directed at improving that one IMO.

I'd not noticed TreeTransform. Is there a reason why its header is not public?

At the time it was introduced Sema.h was not public, which is not the case now so I don't think there is a reason for not making it public anymore.

I think it's because it was developed as an implementation detail of Sema, for the use of template instantiation and some other Sema-internal stuff, and has never been reviewed for suitability for public consumption.

Sebastian

Hi,
   There was discussion about that couple of weeks ago. You can check "AST transformations" thread. The discussion was almost the same. There were some project proposals and I still have no time to come up with something...
Cheers,
Vassil