Another LLVM JIT extension to Python

Dear LLVM,

I am a young developer who have just uploaded my first opensource project based on LLVM. I would like to know what professionals think of my project.

I have started a JIT extension to Python called Pymothoa ( http://code.google.com/p/pymothoa/). Unlike other similar projects, I did not modify the interpreter. Pymothoa uses Python decorators to mark function for JIT compiling. It uses the AST generated by Python; thereby, it uses the same syntax of Python but works like a low-level programming language (like C). The goal is to get speedup in compute-intensive code without writing C-extensions.

If you are interested, there are two demo applications in the source tree: matrix-matrix multiplication and reduce-sum. I would appreciate any comment.

Siu Kwan Lam

Hi Siu Kwan Lam,

that looks very interesting! It is very nice to see how easy it is to install and how easy it is to add proper function annotations. Also, the generated source code seems to be a good start. It would be interesting to try it with Polly [1]. I believe that this could give great speedups for the naive matrix multiply implementation.
Is there a way I can dump the content of the entire LLVM-IR module generated in the demo/matrixmul/matrixmul.py example?

Cheers
Tobi

[1] http://polly.llvm.org

Dear LLVM,

I am a young developer who have just uploaded my first opensource
project based on LLVM. I would like to know what professionals think of
my project.

I have started a JIT extension to Python called Pymothoa (
Google Code Archive - Long-term storage for Google Code Project Hosting.). Unlike other similar projects, I
did not modify the interpreter. Pymothoa uses Python decorators to mark
function for JIT compiling. It uses the AST generated by Python;
thereby, it uses the same syntax of Python but works like a low-level
programming language (like C). The goal is to get speedup in
compute-intensive code without writing C-extensions.

If you are interested, there are two demo applications in the source
tree: matrix-matrix multiplication and reduce-sum. I would appreciate
any comment.

Siu Kwan Lam

Hi Siu Kwan Lam,

that looks very interesting! It is very nice to see how easy it is to install and how easy it is to add proper function annotations. Also, the generated source code seems to be a good start. It would be interesting to try it with Polly [1]. I believe that this could give great speedups for the naive matrix multiply implementation.
Is there a way I can dump the content of the entire LLVM-IR module generated in the demo/matrixmul/matrixmul.py example?

Cheers
Tobi

[1] http://polly.llvm.org

Hi Tobi,

Thank you for your feedback. I will be looking at Polly for better locality optimization. Can I simply include Polly as optimization passes? If so, the pymothoa/llvm_backend/default_passes.py can be easily edited to add new passes. I am still trying to figure out what to include for the optimization pass for the best result.

Is there a way I can dump the content of the entire LLVM-IR module generated in the demo/matrixmul/matrixmul.py example?

You can do so by printing the default_module:

print default_module

You may want to do so before optimizing with "default_module.optimize()" to see what my codegen is doing.

I will be adding more documentation to the project wiki.

Thanks,
Siu Kwan Lam

This is awesome! The noninvasive approach that you took is really cool.

I love the Python `ast` module because it allows things like this to
happen! I really like the syntax that you chose for declaring
variables and types within Python2's native syntax, too.

If you ever port to Python3, you'll be able to use function
annotations <http://www.python.org/dev/peps/pep-3107/&gt; to make it even
slicker.

Also, for whatever it's worth, the code looks pretty "professional" to me.

--Sean Silva

Dear LLVM,

I am a young developer who have just uploaded my first opensource
project based on LLVM. I would like to know what professionals think of
my project.

I have started a JIT extension to Python called Pymothoa (
Google Code Archive - Long-term storage for Google Code Project Hosting.). Unlike other similar projects, I
did not modify the interpreter. Pymothoa uses Python decorators to mark
function for JIT compiling. It uses the AST generated by Python;
thereby, it uses the same syntax of Python but works like a low-level
programming language (like C). The goal is to get speedup in
compute-intensive code without writing C-extensions.

If you are interested, there are two demo applications in the source
tree: matrix-matrix multiplication and reduce-sum. I would appreciate
any comment.

Siu Kwan Lam

Hi Siu Kwan Lam,

that looks very interesting! It is very nice to see how easy it is to
install and how easy it is to add proper function annotations. Also,
the generated source code seems to be a good start. It would be
interesting to try it with Polly [1]. I believe that this could give
great speedups for the naive matrix multiply implementation.
Is there a way I can dump the content of the entire LLVM-IR module
generated in the demo/matrixmul/matrixmul.py example?

Cheers
Tobi

[1] http://polly.llvm.org

Hi Tobi,

Thank you for your feedback. I will be looking at Polly for better
locality optimization. Can I simply include Polly as optimization
passes? If so, the pymothoa/llvm_backend/default_passes.py can be easily
edited to add new passes. I am still trying to figure out what to
include for the optimization pass for the best result.

You need to load the Polly.so object file. After the file is loaded, all Polly passes are automatically available. To load them you have two options:

1) Add them to the pass list

This is a rather long list of additional passes. The passes we add can be seen in lib/RegisterPasses.cpp (you also need the preparing transformations)

2) You use the pass manager builder

Look at llvm/Transforms/IPO/PassManagerBuilder.h with PassManagerBuilder::populateFunctionPassManager(). At -O3 and with enabling the -polly command line option (no idea how that would work), the Polly passes are part of the normal -O3 passes.

Is there a way I can dump the content of the entire LLVM-IR module
generated in the demo/matrixmul/matrixmul.py example?

You can do so by printing the default_module:

print default_module

Perfect, that's what I was looking for.

You may want to do so before optimizing with "default_module.optimize()"
to see what my codegen is doing.

I will be adding more documentation to the project wiki.

Great.

I just looked at the generated code. Polly can not directly optimize it, but I don't see any fundamental problems. In fact the code looks really nice. The main issues I have seen here are:

1. The array references could alias

The arguments of the function matrixmul_naive can alias, which not only blocks Polly from working right now, but will also make other LLVM transformations less effective. If you can guarantee that the arguments do not alias, the best would be to add the parameter attribute [1] 'noalias' to those parameters.

2. No target data set

The LLVM-IR module you are generating does not have any target data string set. When trying my optimizations I set manually something like:

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

This again is something that will both help generic optimizations, as well as Polly.

3. Variable size arrays

You are using variable size arrays in the code you generate. This is perfectly fine, but currently not supported by Polly. As a workaround, setting n = 1024 at the beginning of the code is enough to make Polly work. The right solution is obviously to add variable length array support to Polly.

4. Pass ordering issue

The pass order 'opt -O3 -polly' uses is not good enough to detect your code. Using 'opt -O3 | opt -O3 -polly' works. This means, we probably need to schedule one or two additional canonicalization passes. One reason for this may be, that you 'alloc' data elements in the body of a function. Many LLVM passes put the alloc instructions always in the very first basic block. You may consider doing the same, when doing code generation.

Again, thanks for this very nice cool. I am looking forward to play more with it.

Cheers
Tobi

[1] LLVM Language Reference Manual — LLVM 16.0.0git documentation