C++ to Python Project Proposal

I would like to use Clang to take many C++ classes and wrap them with the Boost::Python for use with python code.

Does anyone have any advice?
Has this been done already?

  • Thanks
  • Jeff Kunkel

I think this is a good idea. However, I expect many users would want to fine-tune the exact mapping (use Python properties instead of set/get pairs, etc.), to make the resulting API more "Pythonic".

Thus, I think the ideal environment for this would be scripted, or otherwise allow users to provide the means to declare how the mapping should be done.

(Note: I'm right now attempting to bind CLang to Synopsis (http://synopsis.fresco.org), which I expect to provide such a scriptable layer for source-to-source mappings, among other use-cases.)

     Stefan

I agree, remove sets/gets would be more like python, but for the first iteration, I will try not to do any extra parsing.

Your link http://synopsis.fresco.org is broken.

  • Thanks
  • Jeff Kunkel

I agree, remove sets/gets would be more like python, but for the first iteration, I will try not to do any extra parsing.

OK. My point was that I expect source-to-source translation to be better handled in a scripting environment, so trying to do that directly in C++ seems suboptimal, at least if it's meant to be useful for the wider public.

Your link http://synopsis.fresco.org/> is broken.

It seems the machine is being temporarily down for an upgrade right now. Sorry for that.

     Stefan

Stefan,

Aye, I agree. But, until that environment exists, I cannot use it.

Oh, any advice on a starting point? I am floundering about the clangCodeGen library, but I am not sure this is the correct place to insert my translation.

  • Thanks
    Jeff Kunkel

I would like to use Clang to take many C++ classes and wrap them with the Boost::Python for use with python code.

My 2 pennies but the project would be much more interesting if you planned to directly output CPython code.

Indeed Boost.Python is a fantastic tool plagued by a fatal flaw: the size of the compiled extensions eventually produced is humongous. For example, I have a library that weights ~500K to be compared with the 2.2 Mo of the corresponding Boost.Python binding. There are several project out there that tries to address that issue (pybindgen, Shiboken) but they are all limited by the tool they use to parse C++ headers, pygccxml for pybindgen, which makes it a no-go on Windows, and hand-roled parsers for Shiboken, which will never ever reach the quality and completeness of Clang.

Clang opens immense opportunities here. Not only to solve that problem but also to revolutionise the subject. A compiler could be built that would on the one hand compile the C++ code as usual, and on the other hand read some special #pragmas or special instructions in comments and then build in memory the Python binding code, and then compile it on the flag, and link it into a shared library.

Luc Bourhis

While I agree with the overal message here, I think requiring the original source code itself to be annotated (either in comments or pragmas or something else) is the wrong way to approach this. (OpenC++ went down that road...)
IMO it's best to keep the mapping instructions separate from the code that is to be mapped. (And it is these instructions that I was referring to as ideally being scripted.)

FWIW,
         Stefan

I am no expert but I do not think its the good place. The simplest things would probably be to build the ast (ie a standard compilation which stop before codegen) and then walk the list of definition/declaration and emit your binding for each. you may want to look at various ast consumers as examples (don't know if it is better to do as ast consumer or to walk the ast yourself.)
No a lot of information here, but it may answer your initial question better than the other (interesting) discutions.

regards,
Cédric

Aye, you said it stops before CodeGen. I was going to replace the llvm code gen to an output a near source translation with boost::python components being added in as needed. I am just having a hard time figuring where to hook into the code.

Basically Clang completes the AST. Then Clang informs Clang-CodeGen that it is complete and it may generate code for this module. I would like to hook into this code gen step and replace it. Unfortunately, I cannot see how it is hooked in. I cannot see the joint which connect CodeGen and the AST generation.

Second, from the comments so far, I think this is a worthy project. Unfortunately, the time it would take to reinvent what boost::python does with CPython might be a bear in its own right. However, the library will be made with the hope that the boost::python calls may be replaced with strait CPython.

  • Thanks
    Jeff Kunkel

I think I found it. Under “ModuleBuilder.h” in the codegen section there is the ‘class CodeGenerator’ which inherits from ASTConsumer. Since, I do not want to interface with ‘CreateLLVMCodeGen’ I do not want to use the ‘CodeGenerator’ class. However, I should be able to generate off of the ‘ASTConsumer.’

Thanks,
Jeff Kunkel

The RecursiveASTVisitor
(http://clang.llvm.org/doxygen/RecursiveASTVisitor_8h_source.html)
might also be worth looking into.

Cheers,
/Manuel

Does SWIG (swig.org) not do what you want?

Rolf

Has anyone tried to build and extend LLVM and Clang with SWIG?

  • Thanks
  • Jeff Kunkel

Out of code here is what and how I wish to generate.

Start: { Push( Module( arg_input_name ) ) } Global { Pop() }

Global:

Namespace { Push( new Module(Top, $1.name) ) } Global { Pop(); }
Object { Push( new Object(Top, $1.name) ) } ObjectElements { Pop(); }
Function { Top.GenerateFunction( $1 ) }
Enum Enumerations { for_each( e in $2 ) Top.GenerateVariable( e ) }
Variable { Top.GenerateVariable( $1 ) }

ObjectElements:

Methods { Top.GenerateMethod( $1 ) }
Variable { Top.GenerateVariable( $1 ) }

// Here is my start to an implementation

static std::stack<Base*> STACK;
Push( Base * base ) { STACK.push(base); }
Pop() { STACK.top().finish(); STACK.pop(); }

struct Base {
static std::vectorstd::stringstream modules;
static std::ostream & createNewStream() {
modules.push_back( std::stringstream() );
std::ostream & out = modules.back();
out << HEADERS;
}

virutal std::ostream & getStream() = 0;
virtual void GenerateFunction( clang::FunctionDecl * ) { throw new std::exception(“Function is not implemented.”); }virtual void GenerateObject( clang::ObjectDecl * ) { throw new std::exception(“Function is not implemented.”); }
virtual void GenerateVariable( clang::VariableDecl * ) { throw new std::exception(“Function is not implemented.”); }
virtual void GenerateMethod( clang::FuctionDecl * ) { throw new std::exception(“Function is not implemented.”); }
virtual void finish();
};

struct Module : Base {

std::string name;
std::ostream & out;
virutal std::ostream & getStream() { return out; }

Module( const Module * createdFrom, std::string NamespaceName ) : out( createNewStream() ) {
this->name = createdFrom.name;
this->name.append(".").append(name);

out << “#include<boost\python.hpp>” << std::endl;
out << “BOOST_PYTHON_MODULE(”<name<<"){"<< std::endl;
out << “using namespace boost::python;”<< std::endl;

}
void GenerateFunction( clang::FunctionDecl * fn ) {
if( basic(fn) )
out << “def(”"<getName()<<"","<getName()<<");"<<std::endl;
// Add descriptors from the function.
// see http://www.boost.org/doc/libs/1_44_0/libs/python/doc/v2/reference.html#models_of_call_policies
}
void GenerateVariable( clang::VariableDecl * decl ) {
if( basic(decl) )
out << “def(”"<getName()<<"","<getName()<<");"<<std::endl;
// FIXME: Add more variable properties.

}
void GenerateObject( clang::ObjectDecl * obj ) {
out << “;” << std::endl;
}

void * operator new() { std::allocator( /* I forget how exactly to do this / ).alloc( sizeof(Module) ); }
void delete() { /
… I forget how to use the std::allocator again. */ }
};

struct Object : Base {

std::ostream & out;
// Since we cannot rely on all the constructors being seen until the whole object has been finalized.

Object( Object * obj ) : out( createNewStream() ) {
out << /* output object header definition. */ ;
// Something like:
// class_(“World”, /add constructors as they are seen./ initstd::string())
}

void GenerateMethod( clang::FuctionDecl * fn );
void GenerateVariable( clang::VariableDecl * var );
void GenerateObject( clang::ObjectDecl * obj );
void finish() {
if( ! haveBody() )
ss.clear();
}

void * operator new() { std::allocator( /* I forget how exactly to do this / ).alloc( sizeof(Module) ); }
void delete() { /
… I forget how to use the std::allocator again. */ }

};