Python bindings in tree

There was some talk on IRC last week about desire for Python bindings to
LLVM's Object.h C interface. So, I coded up some and you can now find
some Python bindings in trunk at bindings/python. Currently, the
interfaces for Object.h and Disassembler.h are implemented.

I'd like to stress that things are still rough around the edges, so use
at your own risk. I intend to smooth things over in the next week or so.

I'd really like to fill out the implementation to cover the entirety of
the C interface. Since this will require a lot of work (Core.h is
*massive*), I wanted to run things by the community before I invest too
much time and create something people don't want (I already had to back
out the Python binding to EnhancedDisassembly because I didn't realize
it was deprecated - oops).

Are people interested in more expansive in-tree Python bindings?
Specifically, do we want a Python API for the IR primitives like type
and value that sit lower than the module APIs?

I know there are other Python bindings floating around and from the
perspective of the project, one option is to just tell people to go use
them. But llvm-py seems to have fallen to the wayside (although I did
read a blog post last week where somebody forked it on GitHub and ported
it to work with current SVN HEAD). Having in-tree bindings would
certainly help prevent bit rot (especially if Python test regressions
can mark builds as failed).

Finally, I checked in the new bindings with no review (I was given the
OK over IRC). If someone would be so kind as to review them, I'd really
appreciate the feedback. Also, if I am to commit new features to the
Python bindings, does anyone have a problem with continuing to hold off
on the review [of new code] until after check-in? I think this would
help lower the "time to market" and get more eyes and early testers
using the bindings. From my experience with Clang, people aren't exactly
lining up to review Python patches, so I fear that new Python features
would be sitting in patch purgatory instead of being tested by early
adopters.

Gregory Szorc
gregory.szorc@gmail.com

Hello,

There was some talk on IRC last week about desire for Python bindings to
LLVM's Object.h C interface. So, I coded up some and you can now find
some Python bindings in trunk at bindings/python. Currently, the
interfaces for Object.h and Disassembler.h are implemented.

FYI:

I recently startet working on Python3 bindings for LLVM 3 as all bindings I
could find were for LLVM 2.x and up to Python 2.6.
I used Cython for easier coding and already ported a big part of Core.h
including all Type and Value classes.

https://www.gitorious.org/python-llvm3

[...]

Gregory Szorc
gregory.szorc@gmail.com

Christoph Grenz

FYI:

I've also been working on new python bindings.

My bindings are written using ctypes (just like the in-tree
clang/cindex bindings). Most of Core.h is bound, and stuff from
ExecutionEngine.h, Analysis, BitReader, BitWriter. The have fairly
good test coverage (using nosetests). The ctypes definitions are
generated from the header files using the clang python bindings.

My local copy also contain a few patches to llvm-c.

Everything can be found here:
http://people.0x63.nu/~andersg/llvm-python-bindings/

* 0001-Fix-class-hierarchy-indentation-in-LLVM_FOR_EACH_VAL.patch
* 0029-Trivial-copy-paste-error-in-LangRef.patch
  These are just cosmetic stuff that I stumbled upon

* 0004-Add-LLVMPrintModule-to-llvm-c.patch
  Adds a new LLVMPrintModule function which is similar to
  LLVMDumpModule but dumps to a string instead of stdout.

* 0005-Add-LLVMCreateMemoryBufferFromData-to-llvm-c.patch
  Adds LLVMCreateMemoryBufferFromData function.

* 0015-LLVMMessageRef.patch
  Adds a "typedef char *LLVMMessageRef;". Which may seem useless. But
  it acts as documentation. All functions that return a string that
  should be freed with LLVMDisposeMessage are changed to use this type
  instead.

* bindings-python.tar.gz
  The bindings/python/ directory.
  There are some hardcoded paths and hacks here and there.

FYI:

I've also been working on new python bindings.

My bindings are written using ctypes (just like the in-tree
clang/cindex bindings). Most of Core.h is bound, and stuff from
ExecutionEngine.h, Analysis, BitReader, BitWriter. The have fairly
good test coverage (using nosetests). The ctypes definitions are
generated from the header files using the clang python bindings.

My local copy also contain a few patches to llvm-c.

Everything can be found here:
http://people.0x63.nu/~andersg/llvm-python-bindings/

* 0001-Fix-class-hierarchy-indentation-in-LLVM_FOR_EACH_VAL.patch
* 0029-Trivial-copy-paste-error-in-LangRef.patch
  These are just cosmetic stuff that I stumbled upon

* 0004-Add-LLVMPrintModule-to-llvm-c.patch
  Adds a new LLVMPrintModule function which is similar to
  LLVMDumpModule but dumps to a string instead of stdout.

* 0005-Add-LLVMCreateMemoryBufferFromData-to-llvm-c.patch
  Adds LLVMCreateMemoryBufferFromData function.

* 0015-LLVMMessageRef.patch
  Adds a "typedef char *LLVMMessageRef;". Which may seem useless. But
  it acts as documentation. All functions that return a string that
  should be freed with LLVMDisposeMessage are changed to use this type
  instead.

* bindings-python.tar.gz
  The bindings/python/ directory.
  There are some hardcoded paths and hacks here and there.

Hi Anders,

FYI:

I've also been working on new python bindings.

My bindings are written using ctypes (just like the in-tree
clang/cindex bindings). Most of Core.h is bound, and stuff from
ExecutionEngine.h, Analysis, BitReader, BitWriter. The have fairly
good test coverage (using nosetests). The ctypes definitions are
generated from the header files using the clang python bindings.

The automatic generation of the Python ctypes interfaces using the Clang
Python bindings is pretty friggin cool!

My local copy also contain a few patches to llvm-c.

Everything can be found here:
Index of /~wanders/llvm-python-bindings/

* 0004-Add-LLVMPrintModule-to-llvm-c.patch
  Adds a new LLVMPrintModule function which is similar to
  LLVMDumpModule but dumps to a string instead of stdout.

* 0005-Add-LLVMCreateMemoryBufferFromData-to-llvm-c.patch
  Adds LLVMCreateMemoryBufferFromData function.

These are desperately needed by the C API. Can you please submit them?

FWIW, all my work is at
https://github.com/indygreg/llvm/tree/python_bindings/bindings/python.
Parts of Core.h still need love (especially the Value system). I'm doing
some dynamic type creation at run-time using the Value hierarchy.
Somewhat scary stuff, but it does seem to work. I really need a
LLVMGetValueID() API to fetch llvm::Value::getValueID() to enable more
efficient value casting. From some discussion on LLVM Project, I think people
are receptive to this. The main concern would be that the C API would be
tied to a specific version of the shared library because the value ID
enumeration aren't guaranteed for all of time. But, that contract is
already broken, so I don't think it's a big deal: just something that
needs to be documented. Of course, Python is a dynamic language, so if
there were a C API that exposed the llvm::Value class hierarchy, we
could always have Python dynamically create types at run-time :slight_smile:

I've also implemented some missing C APIs (such as IR parsing and more
ObjectFile APIs) and have patches awaiting review on the mailing list.

Greg

The automatic generation of the Python ctypes interfaces using the Clang
Python bindings is pretty friggin cool!

A nice side effect is that everything is added to the interface. So it
is easy to add a small proxy over the lib that shows which parts of
the llvm-c API that is exercised by the tests. (have that in my
bindings)

> * 0004-Add-LLVMPrintModule-to-llvm-c.patch
> Adds a new LLVMPrintModule function which is similar to
> LLVMDumpModule but dumps to a string instead of stdout.
>
> * 0005-Add-LLVMCreateMemoryBufferFromData-to-llvm-c.patch
> Adds LLVMCreateMemoryBufferFromData function.

These are desperately needed by the C API. Can you please submit them?

Will do!

FWIW, all my work is at
https://github.com/indygreg/llvm/tree/python_bindings/bindings/python.

Excellent! I'll try to see if I can adapt my bindings to your to fill
in the gaps.

There do indeed seem to be much overlap in our bindings. But there are
a few things where the design differs. If we should try to combine our
work I guess it would be a good idea to discuss these differences, to
make sure we work towards a common goal.

I think the main differences between our bindings are:

* Auto generated vs manual ctypes declarations.

  From your comment above I assume you would prefer auto generated too.
  
* Types inheriting from c_void_p vs having a ptr attribute.

  My bindings has for example Module (indirectly) inheriting from
  c_void_p, that way there is no "from_param" methods needed, and no
  extra attribute of the actual pointer.

  I'm not sure this is better. I might have done with separate pointer
  as you have if I started from scratch today.

* Use of constructor vs "new" static methods.

  When using the bindings one never initializes the class manually.
  Instead a "factory" method is used:
  
  mymod = Module.from_file(...)
  mymod = Module.from_data(...)
  mymod = Module.new("foo")
  ity = Type.int(32)
  
  instead of
  
  mymod = Module(file=...)
  mymod = Module(data=...)
  mymod = Module(name="foo")
  ity = IntType(32)

  I prefer this in, especially in the cases where there are many
  different ways to construct an item. Also many objects are not
  really created standalone. e.g a function is added:
  
  f = Module.add_function(FTy, "foo")

  and the Function constructor is never used. That way having the
  policy "never use constructor" to create objects makes it
  consistent.

  Also this makes it consistent with the old defuct llvm-py bindings.
  
  (partially this also is a consequence of the fact that my bindings
   inherits from c_void_p making it a bit messier)

* Directory layout

  Just minor thing.

  My bindings have python/bindings/lib/llvm
                                  /tests
          /tools

  I do like having the tests outside the dir.

Parts of Core.h still need love (especially the Value system). I'm doing
some dynamic type creation at run-time using the Value hierarchy.
Somewhat scary stuff, but it does seem to work. I really need a
LLVMGetValueID() API to fetch llvm::Value::getValueID() to enable more
efficient value casting.

I'm doing the very same thing in my bindings, and yes it is a bit
inefficient, but seems to work fine and should work fine as long as
classes are not moved in the hierarchy.

I use the same hierarchy at python level. And at python level
recursivly drills down into the correct subclass by doing LLVMIsA* for
the possible (direct) subclasses.

From some discussion on LLVM Project, I think people
are receptive to this. The main concern would be that the C API would be
tied to a specific version of the shared library because the value ID
enumeration aren't guaranteed for all of time. But, that contract is
already broken, so I don't think it's a big deal: just something that
needs to be documented. Of course, Python is a dynamic language, so if
there were a C API that exposed the llvm::Value class hierarchy, we
could always have Python dynamically create types at run-time :slight_smile:

I guess we could have a separate valueid enum and a mapping between
llvm-c<->c++ valueid. IIRC the clang python bindings does that for for
something. That way there wont be any breakage if the c++ side is
changed.

anders

  • Auto generated vs manual ctypes declarations.

This is purely a cosmetic difference, as we both take the same approach of registering functions on a global/shared ctypes library instance. I think automatic is the way to go (just as long as the automatically generated code is easy to diff when they are regenerated).

  • Use of constructor vs “new” static methods.

When using the bindings one never initializes the class manually.
Instead a “factory” method is used:

mymod = Module.from_file(…)
mymod = Module.from_data(…)
mymod = Module.new(“foo”)
ity = Type.int(32)

instead of

mymod = Module(file=…)
mymod = Module(data=…)
mymod = Module(name=“foo”)
ity = IntType(32)

Yeah, Module is an example where the number of named arguments is pretty overwhelming. I was probably going to create static methods for Module creation. Whether I was going to leave the named arguments on the constructor is an open issue. I don’t think it matters too much.

Also this makes it consistent with the old defuct llvm-py bindings.

I’m not too concerned about this consistency. llvm-py began many years ago. Assumptions may be different now. I think we should do what makes sense today. If that is the same great. If not, oh well.

(partially this also is a consequence of the fact that my bindings
inherits from c_void_p making it a bit messier)

Yup :slight_smile: Since I started without this inheritance restriction, I was able to go with what I consider a more “Pythonic” approach: initializers.

My bindings have python/bindings/lib/llvm
/tests
/tools

I do like having the tests outside the dir.

I have no major opinion on this.

In related news, I see you have received commit access, Anders. Congratulations! We just need to figure out this review/module owner situation…

Gregory

automatic is the way to go (just as long as the automatically generated
code is easy to diff when they are regenerated).

Yes it should be easy to diff. My generated code is in the same order
as in the llvm-c/*.h files.

Yeah, Module is an example where the number of named arguments is pretty
overwhelming. I was probably going to create static methods for Module
creation. Whether I was going to leave the named arguments on the
constructor is an open issue. I don't think it matters too much.

I don't think it matters that much either. BUT I think consistency
matters, so it should be decided upon before the API grows. Just
having static methods resonates well with the "only one" part of
"There should be one - and preferably only one - obvious way to do
it.", the question is how obvious it is..

(I think that the internals of the bindings can be simpler if all
initializers just takes a pointer. But if the choice is between
cleaner API or cleaner internals the choice is easy.)

I'm not too concerned about this consistency. llvm-py began many years ago.
Assumptions may be different now. I think we should do what makes sense
today. If that is the same great. If not, oh well.

I totally agree.

> I do like having the tests outside the dir.
>

I have no major opinion on this.

Me neither. It was mostly an obvervation. As long as the tests doesn't
do relative imports so they easily can be run againts another version.

There was another (minor) difference. All testcases in my bindings
were written as functions (i.e no containing UnitTest class), as
nosetests picks them up just fine. I'd don't mind changing them to
UnitTest when/if I port them over.

In related news, I see you have received commit access, Anders.
Congratulations!

Yay! I even managed to commit some stuff!

We just need to figure out this review/module owner
situation...

Yeah. I'm not even sure where we stand today.

It is quite messy to keep track of everything with the official stuff
in the svn repo, your pythonbindings git repo and my local stuff. I'm
not sure what to do. I guess trying to make sure there is as little
code floating around outside of the official svn repo is good goal.

anders