[CodeGen] CodeSize - TailMerging and BlockPlacement

Hi everyone,

The code layout that TailMerging (inside BranchFolding) works on is not the final layout optimized based on the branch probability. Generally, after BlockPlacement, many new merging opportunities emerge. I did an experiment of adding additional BranchFolding and BlockPlacement after the existing BlockPlacement (i.e., -block-placement -branch-folder -block-placement) targeting an AARch64 backend. Thousands of instructions can be removed in spec2006 benchmarks as shown below. I checked the binaries and did not find any increase of unwanted instructions. The change does not hurt any benchmark with noticeable regression and sometimes results in small improvement (1%-3%).

473.astar -7
401.bzip2 -110
403.gcc -13,006
445.gobmk -1,716
464.h264ref -684
456.hmmer -391
462.libquantum -4
429.mcf -4
471.omnetpp -1,980
400.perlbench -4,176
458.sjeng -338
450.soplex -395
483.xalancbmk -4,183
447.dealII -186
433.milc -34
444.namd -104
453.povray -1,785
482.sphinx3 -112

I propose to factor out the relevant code from BranchFolding into a utility, and call it from BlockPlacement whenever the layout is changed. It is similar to D18226 and D18411 which factor tail duplication into a utility and call it from BlockPlacement. Any thoughts, advice, or comments?

Best,

Haicheng

From: "via llvm-dev" <llvm-dev@lists.llvm.org>
To: llvm-dev@lists.llvm.org
Sent: Tuesday, March 29, 2016 8:20:22 AM
Subject: [llvm-dev] [CodeGen] CodeSize - TailMerging and BlockPlacement

Hi everyone,

The code layout that TailMerging (inside BranchFolding) works on is
not
the final layout optimized based on the branch probability.
Generally,
after BlockPlacement, many new merging opportunities emerge. I did an
experiment of adding additional BranchFolding and BlockPlacement
after
the existing BlockPlacement (i.e., -block-placement -branch-folder
-block-placement) targeting an AARch64 backend. Thousands of
instructions can be removed in spec2006 benchmarks as shown below. I
checked the binaries and did not find any increase of unwanted
instructions. The change does not hurt any benchmark with noticeable
regression and sometimes results in small improvement (1%-3%).

473.astar -7
401.bzip2 -110
403.gcc -13,006
445.gobmk -1,716
464.h264ref -684
456.hmmer -391
462.libquantum -4
429.mcf -4
471.omnetpp -1,980
400.perlbench -4,176
458.sjeng -338
450.soplex -395
483.xalancbmk -4,183
447.dealII -186
433.milc -34
444.namd -104
453.povray -1,785
482.sphinx3 -112

I propose to factor out the relevant code from BranchFolding into a
utility, and call it from BlockPlacement whenever the layout is
changed.
  It is similar to D18226 and D18411 which factor tail duplication
  into a
utility and call it from BlockPlacement. Any thoughts, advice, or
comments?

Did anyone yet provide you with feedback on this? It seems like a reasonable plan to me.

-Hal

Hi Hal,

Thank you for your interest. I haven't received any feedback, but I already started the work. Some early cleaning work was already committed and more will be posted for review soon.

Best,

Haicheng