Should we enable Partial unrolling and Runtime unrolling on AArch64?

Hi all,

Partial unrolling and runtime unrolling are enabled by default in aarch64 gcc which is help to get performance better. But these two methods are enabled for only several backends in LLVM which are X86, PowerPC and R600. I don’t know the history of these two kinds of unrolling, and why they are not widely used. I also want to know is, for aarch64 backend, is it intentionally to get them disabled?

I’ve did some experiment around this and see the performance is indeed impacted. Overall, partial unrolling can bring small benefit on most cases of Benchmark and regression is major and small. Runtime unrolling can bring huge improvement on some certain cases but also huge regression on others. The proportion of improvement and regression varies in different Benchmark. Also, code size is increased for two both.

I will show more information before this be changed. Here I just want to know more backgrounds of two unrolling methods.

Correct a typo issue: the word “major” in second paragraph should be “minor”. Sorry about this…

From: "Kevin Qin" <kevinqindev@gmail.com>
To: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Thursday, July 31, 2014 3:03:19 AM
Subject: [LLVMdev] Should we enable Partial unrolling and Runtime unrolling on AArch64?

Hi all,

Partial unrolling and runtime unrolling are enabled by default in
aarch64 gcc which is help to get performance better. But these two
methods are enabled for only several backends in LLVM which are X86,
PowerPC and R600. I don't know the history of these two kinds of
unrolling, and why they are not widely used. I also want to know is,
for aarch64 backend, is it intentionally to get them disabled?

I've did some experiment around this and see the performance is
indeed impacted. Overall, partial unrolling can bring small benefit
on most cases of Benchmark and regression is major and small.
Runtime unrolling can bring huge improvement on some certain cases
but also huge regression on others. The proportion of improvement
and regression varies in different Benchmark . Also, code size is
increased for two both.

I will show more information before this be changed. Here I just want
to know more backgrounds of two unrolling methods.

These unrolling methods have been available in LLVM for several years, but the pass-manager setup and TTI hooks that enable backends to enable these in a target-specific way is relatively new. As you've noticed, per-target tuning is required. Patches are certainly welcome; if you have a modification for AArch64 that provides significant benefits and little downside, please send it to llvm-commits for review.

Thanks for looking at this.

-Hal

Hi Hal,

I want to make sure If there is a conclusion about these unrolling methods on AArch64 target. It seems the answer is no. So it’s worth to spend more time to tune the parameter before sending out the patch. Thanks for providing some background around this.

Regards,
Kevin