Auto-Vectorization in LLVM

Hi,

I found out that Auto-Vectorization was implemented as a part of GSoC 2009.
Can someone point me to the code repository including any documentation available?
I would also like to know if there is any progress/future plans to include this
in the main trunk?

Best Regards,
Raj

Unfortunately, nothing came of this project AFAIK, maybe Devang knows more.

-Chris

I looked for it and couldn't find any, too. I found some
alias/dependency analysis inside loops, but nothing actively trying to
merge instructions.

WRT progress/plans, there is the Poly project
(http://wiki.llvm.org/Polyhedral_optimization_framework) that is an
external representation to LLVM and could make much easier to map
dependencies and leave the road open for auto-vec, but again, nothing
on that direction has been done, too. Tobias and Ether should know
more on that.

cheers,
--renato

http://systemcall.org/

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

Unfortunately I did not get around to do actual auto-vectorisation in my
2009 GSoC project.

I've since continued to work on autovectorisation but switched to an
unrolling-based approach instead of the dependence-based approach
planned for the GSoC 2009 project. I have a basic unrolling-based
autovectoriser working, and plan to start getting this work merged into
mainline LLVM somewhen in mid July. As the vectorisation pass is pretty
stand-alone and only needs a few minor changes to the loop unroller, I
expect merging to be pretty straightforward.

Pretty much. Andreas did contribute initial basic code for loop
dependence analysis and has promised to merge more improvements to
this area.

hi,

I would also like to know if there is any progress/future plans to
include this
in the main trunk?

Unfortunately, nothing came of this project AFAIK, maybe Devang knows more.

I looked for it and couldn’t find any, too. I found some
alias/dependency analysis inside loops, but nothing actively trying to
merge instructions.

WRT progress/plans, there is the Poly project
(http://wiki.llvm.org/Polyhedral_optimization_framework) that is an
external representation to LLVM and could make much easier to map
dependencies and leave the road open for auto-vec, but again, nothing
on that direction has been done, too.

Yep, we are only translate LLVM IR to Polyhedral Represent (or Polly IR) and translate them back to LLVM IR at this moment.

After this is stage, we export the Polly IRs with openscop library (INRIA will public this in soon), and do something like auto-vectorization on (exported) Polly IR, then import them into llvm and translate them back to LLVM IR.

But as i had discuss with tobi, we may not generate the “vectorized” or "parallelized " LLVM IR directly from Polly IR, instead, we may annotate the information such as “this loop is a parallel loop”, or “dependence distance of this loop is 8” to LLVM IR in the form of metadata. Or, we will provide a analysis pass to hold such information.

Then we can write passes that use these metadata (or analysis pass) to transform LLVM IR to do vectorization and parallelization for SIMD architecture, OpenMP, OpenCL and so on.

Any comment or suggestion is welcome :slight_smile:

best regards
ether

Yes, as ether said we believe polly will simplify auto-vectorization a lot. With the help of polyhedral transformations it will be possible to generate and annotate vector parallel loops. These can afterwords be vectorized easily.

I put an example on the wiki:
http://wiki.llvm.org/Polyhedral_optimization_framework#2.3._Vectorization

At the moment polly starts to become useful, but needs probably this summer to become mature. During the last weeks the first very simple tests started to work.
At the moment we can detect matrix multiplication, create polyhedral information and code generate it again.
Exporting the test case, optimizing it, and importing will be done in the next weeks.

As soon as this is done, we can show impressive results for matmult and we compile the llvm-testsuite without crashing I will write a mail on the mailing list. Anybody who wants try polly earlier will probably trigger some unimplemented stuff. However you could try anyways. :wink: I will glad to help you with it.

@Andreas:
Do you believe your vectorization would work on dependence free loops? In that case, I would love to try your pass later. Scheduling it after polly (that created the vector parallel loops), should create vectorized loops easily.

Tobi