Load/store grouping and AntiDepBreaker


On my target, grouping load/store together almost always gives better performance even if it increases register pressure.

Basically I want code like this:
load r1, addr1instr r1, x, y
store addr2 r1
load r1, addr3

instr r1, x, y
store addr4 r1

to be rewritten as:
load r1, addr1
load r2, addr3

instr r1, x, y
instr r2, x, y

store addr2, r1
store addr4, r2

I enabled the AggressiveAntiDepBreaker pass and it actually works for a lot of case but not always.
What would be the best way to guarantee that load/store are grouped and executed together?


I wrote a pass that does exactly that a few days ago. Maybe a small difference in my pass is that I kill all definitions with store instructions right before the terminator instruction.

Another problem with my pass is that some optimizations right before ISel can introduce a few instructions between the stores. I solved that with a machine function pass that reorders the instructions.

This is the original thread in the llvm-dev list: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-April/072024.html

Just out of curiosity, what is your target architecture?