Current state:
1/ LEA with complex addressing mode are supported by Intel micro architectures after Nehalem.
2/ LEA detection is being done during SelectionDAG based instruction selection through
addressing mode based complex pattern matching.
3/ This does not identify LEA patterns beyond Scaling factor 1.
e.g.
T1 = A + B;
T2 = T1 + B;
T3 = T2 + B;
T4 = T3 + 10;
T5 = T4 + 20;
T6 = T5 + B
Above sequence can be folded to
LEA 30( A , 4 , B);
where BASE = A, SCALE = 4, INDEX = B and OFFSET = 30
4/ Control flow information is not present at SelectionDAG level, as SelectionDAG based selection
work over a single BasicBlock at a time. Which makes it difficult to avoid generation of
complex LEA with 3 operands (even with Scale=1) within Loops.
Proposal:
1/ To have a pre-RA pass to identify LEAs with dense folding. By dense folding I mean scale factor greater than 1.
2/ Since this pass will run over MachineInstrs so it will be usable for FastISel and Global ISel based flows also which
bypass SelectionDAG.
3/ At MI level we have control flow Analysis info in the form of MachineDominatorTree and MachineLoopInfo
to avoid formation of LEAs in loops.
4/ Perform CSE over dense LEAs (which have Scale factor > 1) to factor out overlapping computations.
When two LEAs share BASE , INDEX and OFFSET but have different SCALE we can
take out the common complex LEA and generate a simple LEA with legal Scale.
e.g.
LEA1 : RES1 = LEA 10( A , 4 , B)
LEA2 : RES2 = LEA 10( A , 8 , B)
can be converted to
LEA1 : RES1 = LEA 10( A , 4 , B)
LEA2 : RES2 = LEA (RES1 , 4 , B)
5/ Disintegration of complex LEAs with Scale 1 to simple LEA + ADD which is already being handled during
FixupLEAPass.
Kindly drop your suggestions/comments.
Thanks,
Jatin Bhateja