Dear LLVM,
I am a young developer who have just uploaded my first opensource
project based on LLVM. I would like to know what professionals think of
my project.
I have started a JIT extension to Python called Pymothoa (
Google Code Archive - Long-term storage for Google Code Project Hosting.). Unlike other similar projects, I
did not modify the interpreter. Pymothoa uses Python decorators to mark
function for JIT compiling. It uses the AST generated by Python;
thereby, it uses the same syntax of Python but works like a low-level
programming language (like C). The goal is to get speedup in
compute-intensive code without writing C-extensions.
If you are interested, there are two demo applications in the source
tree: matrix-matrix multiplication and reduce-sum. I would appreciate
any comment.
Siu Kwan Lam
Hi Siu Kwan Lam,
that looks very interesting! It is very nice to see how easy it is to
install and how easy it is to add proper function annotations. Also,
the generated source code seems to be a good start. It would be
interesting to try it with Polly [1]. I believe that this could give
great speedups for the naive matrix multiply implementation.
Is there a way I can dump the content of the entire LLVM-IR module
generated in the demo/matrixmul/matrixmul.py example?
Cheers
Tobi
[1] http://polly.llvm.org
Hi Tobi,
Thank you for your feedback. I will be looking at Polly for better
locality optimization. Can I simply include Polly as optimization
passes? If so, the pymothoa/llvm_backend/default_passes.py can be easily
edited to add new passes. I am still trying to figure out what to
include for the optimization pass for the best result.
You need to load the Polly.so object file. After the file is loaded, all Polly passes are automatically available. To load them you have two options:
1) Add them to the pass list
This is a rather long list of additional passes. The passes we add can be seen in lib/RegisterPasses.cpp (you also need the preparing transformations)
2) You use the pass manager builder
Look at llvm/Transforms/IPO/PassManagerBuilder.h with PassManagerBuilder::populateFunctionPassManager(). At -O3 and with enabling the -polly command line option (no idea how that would work), the Polly passes are part of the normal -O3 passes.
Is there a way I can dump the content of the entire LLVM-IR module
generated in the demo/matrixmul/matrixmul.py example?
You can do so by printing the default_module:
print default_module
Perfect, that's what I was looking for.
You may want to do so before optimizing with "default_module.optimize()"
to see what my codegen is doing.
I will be adding more documentation to the project wiki.
Great.
I just looked at the generated code. Polly can not directly optimize it, but I don't see any fundamental problems. In fact the code looks really nice. The main issues I have seen here are:
1. The array references could alias
The arguments of the function matrixmul_naive can alias, which not only blocks Polly from working right now, but will also make other LLVM transformations less effective. If you can guarantee that the arguments do not alias, the best would be to add the parameter attribute [1] 'noalias' to those parameters.
2. No target data set
The LLVM-IR module you are generating does not have any target data string set. When trying my optimizations I set manually something like:
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"
This again is something that will both help generic optimizations, as well as Polly.
3. Variable size arrays
You are using variable size arrays in the code you generate. This is perfectly fine, but currently not supported by Polly. As a workaround, setting n = 1024 at the beginning of the code is enough to make Polly work. The right solution is obviously to add variable length array support to Polly.
4. Pass ordering issue
The pass order 'opt -O3 -polly' uses is not good enough to detect your code. Using 'opt -O3 | opt -O3 -polly' works. This means, we probably need to schedule one or two additional canonicalization passes. One reason for this may be, that you 'alloc' data elements in the body of a function. Many LLVM passes put the alloc instructions always in the very first basic block. You may consider doing the same, when doing code generation.
Again, thanks for this very nice cool. I am looking forward to play more with it.
Cheers
Tobi
[1] LLVM Language Reference Manual — LLVM 16.0.0git documentation