Introductions to everyone and a call for Python-LLVM enthusiasts

Hi all,

First, I just want to say thank you for the excellent LLVM project. I have been playing with LLVM for the first part of this year and have been quite impressed with what I’ve seen and what is possible. I’ve been coding for a long time, but haven’t had this much fun since I first learned Python. The work you have done has opened the door for a tremendous amount of innovation that I think we are just starting to see. Congratulations to all of the developers.

I started a little project back in January of this year called Numba: https://github.com/numba/numba which uses LLVM to translate from Python byte-code to machine code with the purpose of letting people write faster Python kernels that work on NumPy arrays. Many people currently use a tool like Cython or hand-write an extension to Python in C or C++ when they want to get faster code. Numba will let them get compile their Python code without the extra hassle — because of your work on LLVM. For (a little) more information, here’s a link to the video of the lightning talk I gave at PyCon this year describing Numba and a little bit about how LLVM and Python can work more closely together: http://pyvideo.org/video/657/saturday-morning-lightning-talks

LLVM is still very relevant to Python because of projects like Numba — but you should know that PyPy is no longer using LLVM and Unladen Swallow has not been worked on for several years. The future of LLVM and Python I think is very bright — especially for the scientific and data-analysis user-base.

In order to get Numba working, I first ported the work of Mahadevan R. who wrote rather complete Python bindings to LLVM called llvm-py. Unfortunately, these bindings had not been updated since LLVM 2.8. Now, however, these bindings have been updated for LLVM 3.1 and are now sitting at https://github.com/ContinuumIO/llvm-py We plan to continue to maintain these bindings. Perhaps we could discuss how we might help with the official Python bindings as well as maintain these bindings. As an anecdote to the power and functionality of the llvm-py bindings: a few weeks ago, Dave Beazley (a Python guru) taught a compiler “Master” course in Chicago where the students basically all wrote a compiler in python using ply and llvm-py. It was amazing and very empowering. Several people in that course caught the “bug” of LLVM and Python. We had a to fix a few buglets in llvm-py around static global string initialization, but those changes were pretty simple.

We have ambitious goals for Numba which are achievable primarily because all along the way Numba can be used to speed up real code in the Python ecosystem — especially for people who write math, science, and engineering software in Python. Numba works now for several cases and we’ve managed to get a few people working on the project so that we are now in the process of improving the type system, creating a code-gen path that uses the AST, and performing high-level analysis on the expression graph to generate fast, vectorized code from NumPy array operations written in Python. LLVM is making this all possible.

Mainly my purpose is to introduce myself to this list, communicate my appreciation for your work, and help coordinate with like-minded people who are interested in the power that LLVM brings for fast code generation to high-level languages like Python. We can assist in improving the official Python bindings (but the updated llvm-py bindings do work as well). Dave Beazley (a Python guru) teaches a compiler Master course in Chicago and he used llvm-py in his latest course where all the students built a compiler of a Go-like language in Python using Ply and llvm-py.

If you are interested in numba, check out the code and subscribe to the mailing list by sending an email to numba@librelist.org. The first actual release of Numba will be soon, but until then if you are adventurous feel free to check it out of github.

All the best,

Travis Oliphant
travis@continuum.io

Hi Travis,

...

LLVM is still very relevant to Python because of projects like Numba --- but you
should know that PyPy is no longer using LLVM and Unladen Swallow has not been
worked on for several years. The future of LLVM and Python I think is very
bright --- especially for the scientific and data-analysis user-base.

thanks for your interesting email. Do you understand why PyPy is no longer
using LLVM, and why Unladen Swallow died? Does LLVM need to be improved in
some way?

Ciao, Duncan.

Hello Duncan,

thanks for your interesting email. Do you understand why PyPy is no longer
using LLVM, and why Unladen Swallow died? Does LLVM need to be improved in
some way?

The answers to all these questions are linked: LLVM is not fast enough
(for a JIT). Of course this is not the whole story, but it is the
LLVM-relevant part.

Let's have a look at some random performance numbers from one of my pet
projects:

Generate-time: 0.000377893
Compile-time: 0.00987911
1) 0.012272357940673828
2) 0.0018310546875
3) 0.0037310123443603516

Generate-Time is the time it takes for my code to generate the llvm-ir.
Compile-Time is llvm-opts + codegen (mcjit). And this is for a really
small function.

1) Is the total time for jitting + running
2) time for running the compiled code
3) time in the native interpreter

While 2) is entirely in the domain of the person using LLVM, the other
times gives us some serious points for consideration: One needs to be
really really fast to offset the cost of compiling something. And we
have to be really sure what we compile (dynamic feedback) because
recompilation is expensive too.

This is less of a problem for long running processes, but think about a
javascript jit.
LLVM is also really memory hungry: Lua + LuaJIT uses just 200kb, LLVM
alone is much larger. Again less of a problem for server processes, as
dynamic libs are shared.

Full Disclosure: I was a independent contributor to Unladen Swallow

Unladen Swallow failed on 2). It was clearly targeted on long running
processes, but it failed to provide a reasonable performance boost in
compiled code. It had to fight some llvm bugs (this was quite some time
ago already) and by the time the jit worked, too much time had already
passed (for Google). It also head a top-down development model: Start
with a general compiled function (just removing the dispatch overhead)
and add optimization later. Trying to get low hanging fruits first may
have be a better idea.

PyPy changed because of the above + they missed some features. One has
to admit that the garbage collection interface was (is?) in some pretty
bad shape. PyPy relies heavly on its gc.
They also need to be able to patch jumps to add new compiled traces to
failed guards.
They could write their new asm in (r)python ... so they did.

What LLVM needs (imho): LLVM was build as a compiler, not a jit. That's
a big difference on the assumptions about runtime and code quality.
In a jit you do not care if the register allocation is not optimal, when
you get your compiled code fast. If you want to compete with GCC, every
stack spill counts!

tl;dr much faster code gen (even if it costs code quality), lower memory
use, more information about the generated machinecode.

Regards,

Joerg Bank

If you didn't catch it, there has been a recent post to the mailing
list that seems like it might be relevant to your interests:
<http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-June/051298.html&gt;

Direct link to the project page: <Google Code Archive - Long-term storage for Google Code Project Hosting.;

--Sean Silva

If you didn't catch it, there has been a recent post to the mailing
list that seems like it might be relevant to your interests:
<http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-June/051298.html&gt;

Direct link to the project page: <Google Code Archive - Long-term storage for Google Code Project Hosting.;

Thanks, I did catch it :slight_smile: That's one of the reasons I posted here since we've been working for several months on Numba. I've also been in touch with the author and we are merging our efforts.

Thanks very much for the reference though.

-Travis

Joerg Blank wrote:

Hello Duncan,

thanks for your interesting email. Do you understand why PyPy is no longer
using LLVM, and why Unladen Swallow died? Does LLVM need to be improved in
some way?

The answers to all these questions are linked: LLVM is not fast enough
(for a JIT). Of course this is not the whole story, but it is the
LLVM-relevant part.

Let's have a look at some random performance numbers from one of my pet
projects:

Generate-time: 0.000377893
Compile-time: 0.00987911
1) 0.012272357940673828
2) 0.0018310546875
3) 0.0037310123443603516

Generate-Time is the time it takes for my code to generate the llvm-ir.
Compile-Time is llvm-opts + codegen (mcjit). And this is for a really
small function.

1) Is the total time for jitting + running
2) time for running the compiled code
3) time in the native interpreter

While 2) is entirely in the domain of the person using LLVM, the other
times gives us some serious points for consideration: One needs to be
really really fast to offset the cost of compiling something. And we
have to be really sure what we compile (dynamic feedback) because
recompilation is expensive too.

This is less of a problem for long running processes, but think about a
javascript jit.
LLVM is also really memory hungry: Lua + LuaJIT uses just 200kb, LLVM
alone is much larger. Again less of a problem for server processes, as
dynamic libs are shared.

Full Disclosure: I was a independent contributor to Unladen Swallow

Unladen Swallow failed on 2). It was clearly targeted on long running
processes, but it failed to provide a reasonable performance boost in
compiled code. It had to fight some llvm bugs (this was quite some time
ago already) and by the time the jit worked, too much time had already
passed (for Google). It also head a top-down development model: Start
with a general compiled function (just removing the dispatch overhead)
and add optimization later. Trying to get low hanging fruits first may
have be a better idea.

PyPy changed because of the above + they missed some features. One has
to admit that the garbage collection interface was (is?) in some pretty
bad shape. PyPy relies heavly on its gc.
They also need to be able to patch jumps to add new compiled traces to
failed guards.
They could write their new asm in (r)python ... so they did.

What LLVM needs (imho): LLVM was build as a compiler, not a jit. That's
a big difference on the assumptions about runtime and code quality.
In a jit you do not care if the register allocation is not optimal, when
you get your compiled code fast. If you want to compete with GCC, every
stack spill counts!

tl;dr much faster code gen (even if it costs code quality), lower memory
use, more information about the generated machinecode.

I have a little bit to add to this story. One of the things to remember is that LLVM only takes care of low-level issues, it's important to perform high-level optimizations before producing LLVM IR. The IR produced by unladen swallow was enormous. If I recall correctly, the simple

   def add(x, y):
     return x + y

was close to 100 basic blocks due to all the implicit method calls and fallback paths to the interpreter. This size expansion was the driving cause of the slow llvm compile times. More high level optimizations should help reduce this.

It was hard to find good high-level optimizations to add to unladen swallow because of a design decision to make the llvm path optional and permit changes to the non-llvm path to happen by developers with no knowledge of the llvm side. Consider type inference for instance. U-S couldn't do type inference in the llvm lowering path because that would require hard-coding some knowledge of the type system into the llvm-specific code path.

Nick