interesting preso

I enjoyed this presentation:
http://www.linux-kongress.org/2009/slides/compiler_survey_felix_von_leitner.pdf

Among other things it lists small code sequences when compiled with a small collection of compilers, including llvm 2.6. It looks like there are several fairly obvious things we could do better...

-Chris

Wow, very comprehensive!

Is there anyone working on vectorization? This is something that
interests me, I might give it a try, just need some pointers.

cheers,
--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

The first step is loop dependence analysis. This is required to determine loop reuse information and is the basis for a lot of vectorization and parallelization loop transformations. There is work in this area in mainline llvm, but I don't know the status of it.

-Chris

I suppose all dependencies can be determined with function passes and
module-wide analysis.

LLVM does unroll small loops, but once the number of iterations is too
big, it does not even attempts to unroll a multiple of the iterations.

for(i=0;i<4;++i) unrolls to four flat calls but

for(i=0;i<400;++i) doesn't unroll to 100 iterations of four flat calls...

Is there any IR vectorial instruction? Or does it need to go as
metadata for the codegen routines? So, instead of unrolling at the IR
level, we could have some MISD/SIMD instructions with the whole range
and let the codegen define what low-level instructions to use in each
case. So, a processor without VFP would unroll the loop, while one
with could use the VFP instructions instead of unrolling.

Collapsing memset-like loop:
multistore i32 %value, [ 400 x i32 ]* %array

Collapsing memcpy-like loop:
multicopy [ 400 x i32 ]* %orig, [ 400 x i32 ]* %dest

Like the MSVC, we could also detect pointer copy loops and revert to a
memcpy call. If a loop is called more than a few times, might be
better (if space optimisations are not on) to create a region in
memory to copy from with memcpy. This is particularly useful in
repetitive calls to reset an array for the next iteration in a
specific parallel computation.

In that case, instead of creating new instructions, we could use those
functions, inline them as often as possible and optimise them to VFP
instructions later.

cheers,
--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

Renato Golin wrote:

LLVM does unroll small loops, but once the number of iterations is too
big, it does not even attempts to unroll a multiple of the iterations.

Partial unrolling is supported by the unroller when using the
additional -unroll-allow-partial switch (false by default).
Works for me with LLVM 2.5.

I see, pardon my ignorance. Why isn't that part of -O3?

cheers,
--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

The person should be lauded for saying that inline assembly isn't all it's cracked up to be, and should be avoided when possible. :slight_smile:

-bw