RFC: Identification of LEA instructions with complex addressing mode

jbhateja · July 9, 2017, 6:20am

Current state:

1/ LEA with complex addressing mode are supported by Intel micro architectures after Nehalem.

2/ LEA detection is being done during SelectionDAG based instruction selection through
addressing mode based complex pattern matching.

3/ This does not identify LEA patterns beyond Scaling factor 1.
e.g.
T1 = A + B;
T2 = T1 + B;
T3 = T2 + B;
T4 = T3 + 10;
T5 = T4 + 20;
T6 = T5 + B

Above sequence can be folded to

LEA 30( A , 4 , B);

where BASE = A, SCALE = 4, INDEX = B and OFFSET = 30

4/ Control flow information is not present at SelectionDAG level, as SelectionDAG based selection
work over a single BasicBlock at a time. Which makes it difficult to avoid generation of
complex LEA with 3 operands (even with Scale=1) within Loops.

Proposal:

1/ To have a pre-RA pass to identify LEAs with dense folding. By dense folding I mean scale factor greater than 1.

2/ Since this pass will run over MachineInstrs so it will be usable for FastISel and Global ISel based flows also which
bypass SelectionDAG.

3/ At MI level we have control flow Analysis info in the form of MachineDominatorTree and MachineLoopInfo
to avoid formation of LEAs in loops.

4/ Perform CSE over dense LEAs (which have Scale factor > 1) to factor out overlapping computations.
When two LEAs share BASE , INDEX and OFFSET but have different SCALE we can

take out the common complex LEA and generate a simple LEA with legal Scale.
e.g.

LEA1 : RES1 = LEA 10( A , 4 , B)
LEA2 : RES2 = LEA 10( A , 8 , B)

can be converted to

LEA1 : RES1 = LEA 10( A , 4 , B)
LEA2 : RES2 = LEA (RES1 , 4 , B)

5/ Disintegration of complex LEAs with Scale 1 to simple LEA + ADD which is already being handled during
FixupLEAPass.

Kindly drop your suggestions/comments.

Thanks,
Jatin Bhateja

Saba_Lama · July 9, 2017, 7:07am

Is this an RFC for (an expansion of) the patch you already have https://reviews.llvm.org/D35014? Or are you planning on making a different design?

Thanks,

Lama

jbhateja · July 9, 2017, 7:57am

Hi Lama,

This is related to PR 32755 filed by you. But patch will not be able to fold beyond scaling factor 2.

So to take care of cases which fold to scaling factor > 2 we may have to move extraction logic LEAs to a new/existing pass before MI Scheduling as has been proposed.

Kindly provide your comments/suggestions.

Thanks,
Jatin

Topic		Replies	Views
X86 missed optimization (PR24052) LLVM Dev List Archives	0	51	January 6, 2017
Question about instruction selection LLVM Dev List Archives	1	104	January 19, 2019
Instruction Selection LLVM Dev List Archives	1	81	September 21, 2011
Pattern matching for Instruction Beginners	0	949	December 1, 2020
Codegen performance issue: LEA vs. INC. LLVM Dev List Archives	5	71	October 5, 2013

RFC: Identification of LEA instructions with complex addressing mode

Related Topics