[PROPOSAL] Improve uses of LEA on Atom

Hi,

Here is an update on our proposal to improve the uses of LEA on Atom processors.

  1. Disable current generation of LEAs

Due to a 3 cycle stall between the ALU and the AGU any address generation done using math instruction will cause a stall on loads and stores which are within 3 cycles of the address generation. Consequently, the heuristics for using LEAs efficiently must know how many cycles pass between the address generation and its use. However, currently LEAs are inserted before this information is known (ie before register allocation). Part of the attached patch disables the current generation of LEAs.

  1. Identify loads and stores in a X86PassConfig::addPreEmitPass() pass

We will use an addPreEmitPass pass, similar to the VZeroUpper pass. For each load/store found we will identify its address and index, and examine previous instructions to identify where they are being generated to identify opportunities for LEAs.

  1. Replacing instructions with LEAs

Instructions such as add/{reg,imm}, add/{reg,imm}+shift/{reg,imm}, or sub/imm, will be replaced with a single LEA. This will potentially reduce the number of registers in use, however, because this pass follows register allocation it will not affect instruction scheduling.

Attached is an incomplete patch with test cases that disables current LEA generation and includes an empty pre-emit pass that will contain the LEA selection heuristics.

Any feedback you may have on this updated plan is welcome.

Sincerely,

Tyler Nowicki

Intel

UpdatedProposalPatch-svn.patch (18.6 KB)

Was there any development on this? I noticed that clang still produces
a lea for the testcase in llvm.org/pr13320.

Thanks for the reminder!

The work which we did on fixing up LEAs focused on converting instructions to LEAs after register allocation on Atom.

Given the way that the X86 code generator generates LEA instructions, the performance improvement requested by PR13320 might best be done as a peephole optimization after register allocation.

We have now added this issue to our backlog of work to do, but I cannot hazard a guess as to when the issue would be addressed.