llvmcpy: yet another Python binding for LLVM

Alessandro_Di_Federi · January 12, 2017, 7:44pm

Hi, I wrote yet another [1,2] Python binding for LLVM! I'm doing this
because llvmlite has some serious limitations: 1) it cannot parse an
existing IR, only create new modules [3], 2) it keeps its own
representation of the IR (which is less memory efficient than the LLVM
one), and 3) each llvmlite version supports a single LLVM version.

Considering that my need is to load modules of hundreds of MiB, this
is was kind of a problem.
So I've come up with a "Python API generator". Basically it uses CFFI
[4] to parse the LLVM-C API headers and automatically generate (using
some heuristics) a Pythonic API, with classes, properties and the like.

I've quickly tested it with LLVM 3.4, 3.8 and 3.9, and, for its
simplicity, does a good job. It also supports multiple LLVM
installations (it uses the one of the first llvm-config in path).

I'd be happy to have some feedback, give it a look:

https://rev.ng/llvmcpy

preames · January 13, 2017, 1:30am

Using something like CFFI to autogenerate bindings is definitely a good approach to this problem. It'll produce bindings which aren't entirely idiomatic for python, but they'll at least be reasonable likely to remain in sync. This also has the nice property that new additions to the C API get picked up without manual work; this should serve to incentive contribution in this area.

You mention in your readme that you had to slightly modify the LLVM C headers to get this approach to work. Can you point out a couple of example changes? Maybe these are things we should consider taking upstream.

I've not familiar with the details of CFFI. Are the bindings it generates for a particular set of headers specific to the machine it's generated on? Or could the resulting bindings be published and reused directly? If so, hosting a set of bindings for previous releases would be a useful service.

Philip

Alessandro_Di_Federi · January 13, 2017, 9:03am

You mention in your readme that you had to slightly modify the LLVM C
headers to get this approach to work. Can you point out a couple of
example changes? Maybe these are things we should consider taking
upstream.

Take a look at the `clean_include_file` function:

github.com

revng/llvmcpy/blob/master/llvmcpy/llvm.py#L342


      
                  pointee = return_type.item
          
                  # Are we returning an LLVM object? Wrap it in the appropriate class
                  if (pointee.kind == "struct"
                      and is_llvm_type(pointee.cname)):
          
                      return_type_name = remove_llvm_prefix(pointee.cname)
                      result += "        return {}({})".format(return_type_name,
                                                               call())
          
                  elif pointee.kind == "primitive" and pointee.cname == "char":
                      # Returning a char **, wrap it as a Python string
                      result += "        return ffi.string({})".format(call())
                  else:
                      # All the rest
                      result += "        return " + call()
              else:
                  # All the rest
                  result += "        return " + call()
          
          # Generate pythonic way to iterate over list of objects (e.g., functions in

Basically CFFI doesn't handle enum entries whose valus is computed
through an expression. In the LLVM-C API sometimes we have 1 << 8.
Also, static inline functions are not handled too (CFFI only handles
function prototypes), so I've to strip them away.

I'm not sure it'll ever be possible to handle unmodified LLVM-C API
headers with no modifications, and given that one explicit aim is to
support older versions of LLVM I'd have to keep that code anyway.

It would be nice, however if that code doesn't have to grow in the
future (e.g., having sophisticated expression as enum values).

A thing I like about the C API is the consistency in function naming
like having LLVMGetSomething/LLVMSetSomething pairs,
LLVMCountSomethings/LLVMCountSomethings pairs and
LLVMGetFirstSomething/LLVMGetNextSomething pairs.

What I'd need would be the ability to know the name of the arguments,
which CFFI doesn't provide. That would allow me to set up slightly more
robust heuristics. For instance I'm now transforming a pair of pointer
arguments followed by an integer as a pointer to an array plus its
size, and it's fine in current versions of LLVM but it's not very
robust. Same argument for error messages, having the argument name
would help. But this is more a CFFI issue.

I've not familiar with the details of CFFI. Are the bindings it
generates for a particular set of headers specific to the machine
it's generated on? Or could the resulting bindings be published and
reused directly? If so, hosting a set of bindings for previous
releases would be a useful service.

I'm not entirely sure they're portable across OS/architectures. What
would be the use case? It takes a moment to generate the bindings but
it's something the module will lazily do for you only once.

Topic		Replies	Views
Status of Python bindings? LLVM Dev List Archives	2	94	April 13, 2006
LLVM Python binding LLVM Dev List Archives	2	92	March 25, 2008
Python bindings in tree LLVM Dev List Archives	8	98	March 21, 2012
Python bindings? LLVM Dev List Archives	6	98	March 28, 2008
Llvm and python bindings under windows LLVM Dev List Archives	0	88	September 20, 2018

llvmcpy: yet another Python binding for LLVM

Related Topics