Potential Summer of Code Project: Ruby bindings for libclang

Hello all,

My name is Brian Gesiak. I’m a research student at the University of Tokyo, and I’m very excited for Google Summer of Code this year (see http://www.google-melange.com/gsoc/homepage/google/gsoc2014 for details)! I’d like to propose a potential project and see if anyone on this list has any feedback.

I’d like to work on libclang bindings. I think the bindings are a common starting point for developers who wish to get started with Clang. I’m hoping that improving them could have a big impact.

Specifically, I’d like to work on the following three milestones:

  1. Ruby bindings for libclang, added to the Clang repository and packaged a RubyGem
  2. Refactoring the Python bindings; removing FIXMEs, breaking cindex.py into multiple files for easier reading
  3. Improving documentation on libclang; adding READMEs to the Python and Ruby bindings, as well as sample applications

If I can achieve the above three milestones with some time to spare, I’d like to work on expanding the libclang API in general.

Before becoming a research student, I used to work as an iOS developer. In recent years many in the Objective-C community have used Ruby to make development tools. I’m hoping that adding Ruby bindings and improving documentation for libclang will have a large impact on the community.

However, I’m not sure how big of a priority libclang is for this project. It’s not mentioned on the list of open projects (see http://clang.llvm.org/OpenProjects.html), so please let me know if I’m barking up the wrong tree. LLVM is my top choice as a mentoring organization for Google Summer of Code this year, so if libclang isn’t a viable project I’ll try to propose something else.

Thanks for your time! Any and all feedback is greatly appreciated.

  • Brian Gesiak

+Argyrios: if you have time to write an email and you have in mind any
interesting development directions for libclang that you never had
time to implement, please share.

Dmitri

Brian,

I appreciate your interest, but I think that a Ruby bindings project will realistically be less likely to gather enough interest compared to other project ideas.

You mentioned “I’d like to work on expanding the libclang API in general”, if you were to expand this in a project proposal it could be more promising.

Argyrios, Dmitri,

Thanks so much for the feedback! I’m trying to flesh out the expansion idea as best I can; I’ll reply once I have something presentable.

  • Brian Gesiak

I'd really like to see a libclang API to perform codegen. It's been
discussed on this list before. I think we reached a tentative agreement
on what the API would look like. It likely does not resemble the 2+ year
old WIP at [1] :slight_smile:

[1] https://github.com/indygreg/clang/compare/master...libclang_compiler

Argyrios, Dmitri,

Thanks so much for the feedback! I'm trying to flesh out the expansion idea as best I can; I'll reply once I have something presentable.

Hi Brian,

Here's a blue-sky idea:

A lightweight C interface suited to dynamic language bindings that exposes the AST and its properties as a simple object hierarchy with few statically typed objects or enumerations.

The C interface would comprise just a handful of functions to get/set(?) properties and iterate through child nodes.

With this design, additions or removals to clang AST internals wouldn't require the interface itself or bindings to be updated, solving one of the limitations with current libclang that tends to miss parts of the AST until someone gets around to exposing them.

Language bindings built on top would feel more native and potentially be more performant without the need to wrap and bind different C types, so the Ruby binding would then become an exercise in Ruby coding with more flexibility on the final representation rather than wrapping a wide C API.

(The functions could live alongside libclang and share all the bookeeping so it doesn't need to be seen as a libclang2).

I don't know what Argyrios will make of this, it's probably a love/hate kind of idea but should be feasible :wink:

Alp.

Argyrios, Dmitri,

Thanks so much for the feedback! I’m trying to flesh out the expansion idea as best I can; I’ll reply once I have something presentable.

Hi Brian,

Here’s a blue-sky idea:

A lightweight C interface suited to dynamic language bindings that exposes the AST and its properties as a simple object hierarchy with few statically typed objects or enumerations.

The C interface would comprise just a handful of functions to get/set(?) properties and iterate through child nodes.

With this design, additions or removals to clang AST internals wouldn’t require the interface itself or bindings to be updated, solving one of the limitations with current libclang that tends to miss parts of the AST until someone gets around to exposing them.

Language bindings built on top would feel more native and potentially be more performant without the need to wrap and bind different C types, so the Ruby binding would then become an exercise in Ruby coding with more flexibility on the final representation rather than wrapping a wide C API.

(The functions could live alongside libclang and share all the bookeeping so it doesn’t need to be seen as a libclang2).

I don’t know what Argyrios will make of this, it’s probably a love/hate kind of idea but should be feasible :wink:

This would provide ABI and source API stability, but libclang would not be actually be “stable” as it is now.
Currently clients can use a new libclang version and no changes are required on their end. If you tie them to the clang internals then they would need changes to accommodate the new state of the clang internals.

Argyrios, Dmitri,

Thanks so much for the feedback! I'm trying to flesh out the expansion idea as best I can; I'll reply once I have something presentable.

Hi Brian,

Here's a blue-sky idea:

A lightweight C interface suited to dynamic language bindings that exposes the AST and its properties as a simple object hierarchy with few statically typed objects or enumerations.

The C interface would comprise just a handful of functions to get/set(?) properties and iterate through child nodes.

With this design, additions or removals to clang AST internals wouldn't require the interface itself or bindings to be updated, solving one of the limitations with current libclang that tends to miss parts of the AST until someone gets around to exposing them.

Language bindings built on top would feel more native and potentially be more performant without the need to wrap and bind different C types, so the Ruby binding would then become an exercise in Ruby coding with more flexibility on the final representation rather than wrapping a wide C API.

(The functions could live alongside libclang and share all the bookeeping so it doesn't need to be seen as a libclang2).

I don't know what Argyrios will make of this, it's probably a love/hate kind of idea but should be feasible :wink:

This would provide ABI and source API stability, but libclang would not be actually be “stable” as it is now.

All true, but a C tree binding like this can also simplify/solve plenty of other problems like XML/JSON AST dump, AST tree viewers by encoding the logic internally.

Presenting the whole AST through a succinct API will be more useful for consumers that want to dig through everything.

Currently clients can use a new libclang version and no changes are required on their end. If you tie them to the clang internals then they would need changes to accommodate the new state of the clang internals.

Right again, but I suspect that dynamic languages are better setup than C to deal with minor changes to the AST so it shouldn't be a problem in practice -- nothing that can't be solved with a JS prototype and feature check at runtime.

On the other hand libclang stability comes at a cost -- it can't keep up with fixes made to the AST in ToT (e.g. parameters are still called 'arguments' in libclang). The public API isn't really living up to its potential because of this.

So, if there's interest in exposing this kind of tree API, dynamic ASTMatchers is probably the place to start because it already has naming, iterators and visitation for AST fields that could be exposed in a C API. Seems worth doing if light work can be made of it.

This isn't a criticism of libclang and as you know I'm helping out there, rather experience from trying to work with it for non-IDE use cases that leads me in this direction :slight_smile:

Alp.

Thanks, everyone!

I tried looking deeper into libclang, but I think it'd be too
ambitious to learn about it and try and work on it for this year's
Google Summer of Code. I'll try submitting a different application for
LLVM/Clang next year, after getting more acclimated with the project
in general.

Thanks for all the feedback!

- Brian Gesiak