Hello all,
I have done some basic experiments about Polly canonicalization passes and I found the SCEV canonicalization has significant impact on both compile-time and execution-time performance.
Detailed results for SCEV and default canonicalization can be viewed on: http://188.40.87.11:8000/db_default/v4/nts/32 (or 33, 34)
*pNoGen with SCEV canonicalization (run 32): -O3 -Xclang -load -Xclang LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none -mllvm -polly-codegen-scev
*pNoGen with default canonicalization (run 33): -O3 -Xclang -load -Xclang LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none
*pBasic without any canonicalization (run 34): -O3 -Xclang -load -Xclang LLVMPolly.so
Impact of SCEV canonicalization:
http://188.40.87.11:8000/db_default/v4/nts/32?compare_to=34&baseline=34
Impact of default canonicalization:
http://188.40.87.11:8000/db_default/v4/nts/33?compare_to=34&baseline=34
Comparison of SCEV canonicalization with default canonicalization:
http://188.40.87.11:80! 00/db_default/v4/nts/32?compare_to=33&baseline=33
As we expected, both SCEV canonicalization and default canonicalization will slightly increase the compile-time overhead (at most 30% extra compile-time). They also lead to some execution-time regressions and improvements.
The only difference between SCEV canonicalization and default canonicalization is the “IndVarSimplify” pass as shown in the code RegisterPasses.cpp:212:
if (!SCEVCodegen)
PM.add(polly::createIndVarSimplifyPass());
However, I find it is interesting to look into the comparison between ! SCEV canonicalization and default canonicalization (http://188.40.87.11:8000/db_default/v4/nts/32?compare_to=33&baseline=33):
First of all, we can expect SCEV canonicalization has better compile-time performance since it avoids the “IndVarSimplify” pass. Actually, it can gain more than 5% compile-time performance improvement for 32 benchmarks, especially for the following benchmarks:
MultiSource/Applications/lemon/lemon -11.02%
SingleSource/Benchmarks/Misc/oourafft -10.53%
SingleSource/Benchmarks/Linpack/linpack-pc -10.00%
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan -8.31%
MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt -8.18%
Second, w! e find that SCEV canonicalization has both regression and improvement of execution performance compared with default canonicalization. Actually, there are many execution-time regressions such as:
SingleSource/Benchmarks/Shootout/nestedloop +16363.64%
SingleSource/Benchmarks/Shootout-C++/nestedloop +16200.00%
SingleSource/UnitTe! sts/Vectorizer/gcc-loops +107.35%
SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect +75.00
SingleSource/Benchmarks/Misc/flops-6 +40.03%
SingleSource/Benchmarks/Misc/flops-5 +40.00%
&n bsp; MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan 30.00%
as well as many execution-time improvements such as:
SingleSource/Benchmarks/Shootout/ary3 -28.98%
SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog/dynprog -26.97%
SingleSource/Benchmarks/CoyoteBench/lpbench -25.84%
MultiSource/Benchmarks/BitBench/drop3/drop3 -16.58%
MultiSource/Benchmarks/Ptrdist/yacr2/yacr2 -16.46%
&nb sp;MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt -14.96%
I think the execution-time performance regression is mainly because of the unexpected performance improvements from non-SCEV canonicalization as shown int eh following bug: http://llvm.org/bugs/show_bug.cgi?id=17153. I will try to find out why “IndVarSimplify” can produce better code in the next step. If we can eliminate “IndVarSimplify” canonicalization but keep on producing high-quality code, then we can gain better compile-time performance without execution-time performance loss.
Best,
Star Tan