Heroic LLVM optimizations

I am a professor at UC Santa Cruz, but I also do consulting a Huawei. Chris Lattner told me that I should post
this in the llvm-dev.

  HiSilicon (Santa Clara office) is looking for some developer capable of implementing the "heroic optimizations" (http://llvm.org/devmtg/2015-10/slides/Gerolf-PerformanceImprovementsAndHeadroom.pdf) in LLVM. Focus on SPEC2006 but also looking at the new SPEC2017.

  The goal is to match, or get closer, to the Intel compiler with SPEC2006. ICC has a significant advantage. As the talk shows, there is over 10x diff in libquantum,
and other benchmarks have also significant difference between latest gcc/llvm and ICC.

  Send me an email with your CV or questions if you want a full time job working on this (open source) and helping with
other compiler optimizations for future ARMv8 servers. Something like 50% of the time open source LLVM, 50% in new
compiler/JIT opts for future arm server.

Hi Jose,

we have work based on Polly which should get the loop-fusion in
SPEC2017. The code is not yet ready to share, but I would be interested
to learn if this would be of use to you.


Hi Tobias-

The loop fusion you mention is the one in libquantum/cpu2006 ? Or something else in cpu2017 ?


Sorry, I meant libquantum/cpu2006.


I’ll be interested in seeing the improvements. As a reference, this is what I get in an Intel 6700K when
I compare gcc 5.4 (Ofast flto) vs published Intel results. 23x in libquantum, and over 40% in many benchmarks.

I think that it is mostly from AoS vs SoA and loop transformations.

Use of Smartheap and -m32 should close a bit of those gaps.