How to use Clang to translate from C++ to another language

I’ve recently compiled Clang and LLVM on Windows. My goal is to use it to translate from C++ to another language (specifically Actionscript), but I’m not sure the best way to go about this. For example, if I invoke clang with the -ast-print “pretty print” option, it looks like Clang can get a faithful representation of the original code from its internal AST. Do I need to somehow mimic that code so I get a pretty print in my new language? Or should I walk the AST tree? Thanks for any help!

Have a look at the ObjC Rewriter (in lib/Rewrite), it convert Obj-C to C, so this is close to what you're looking for.

-- Jean-Daniel

Yes, that's likely to be the best approach - it's what I did to translate Objective-C to JavaScript. You will probably want to create a clang plugin that subclasses RecursiveASTVisitor. This should walk the AST and emit code for your target language.

You could use the rewriter, but I would strongly recommend against it. For example, when translating a class with multiple inheritance into ActionScript you will need to create a class that composes them and then turn pointer casts into some fairly complex logic for accessing the relevant class (upcasts, in particular, are going to be especially horrible - no idea how you plan on implementing these) so trying to do simple rewriting is going to involve a lot more pain than simply writing out a new program from the AST.


Right. The rewriter is good when you're just changing from one dialect to another within the same base language---Objective-C to C, C99 to C90, etc.---and want to preserve comments and code structure for all of the common parts. If the two languages are very different (C++ to Actionscript), you should either be transforming IR or walking the AST.

  - Doug

Thanks everyone for the responses, sounds like my best bet is to subclass RecursiveASTVisitor. I’m curious, sounds like Clang worked well to translate Objective-C to Javascript, since Javascript is similar to Actionscript, any gotchas to look out for? Yes I agree multiple inheritance would be a nightmare, I’m going to cheat by only converting C++ code that doesn’t use it.

A few, yes.

- JavaScript numbers are really horrible to work with. To support correct truncation / overflow behaviour you need to jump through a lot of hoops. 64-bit arithmetic... I still haven't done yet.

- Pointers. Object pointers were easy, but generic pointers are really hard. I am using an ArrayBuffer for all allocations (heap or stack). When you store a pointer in memory, it stores a number in the buffer and it also stores the JavaScript reference as a property on the array. This lets you read back the value as either a pointer or a number.

- Pointer arithmetic requires wrapping in something that does address calculation. I just emit these as a call to a pointerAdd() function which returns a new reference object. This contains a pointer and an offset. When you try to dereference it, you get an error if it's out of range, but this lets you do things like &a + b - b for arbitrary values of b without breaking things.

- Weak references are impossible in JavaScript. This sucks. It's probable even worse in C++ if you use smart pointers, because you can easily create something that's cyclic and never freed, even though you only have owning pointers going in one direction.

On the plus side, you can do stuff like:

int *thisIsSoWrong(void) {
   int a = 42;
   return &a;

And have it actually work, and not crash when you dereference the pointer in my current implementation. I could probably get better performance by reusing a large ArrayBuffer for the whole stack, rather than a separate one for each variable, but I don't really care about performance in this - if you care about performance then JavaScript is the wrong tool for the job.

Have fun!