I’ve noticed llvm-mca ignores memory reference dependencies between corresponding iterations, that means if I store to (%rax) at the end of my code and load from it at the head there is a store-load dependency between code iterations, I’ve noticed that llvm-mca doesn’t respect that and the new load doesn’t wait for the store from the previous iteration (also in the timeline view), is this a known limitation? as this is critical for performance.
If you store at the bottom of a loop and then reload it at the top,
why not just use the register at the top and avoid the load altogether.
This is essentially hoisting the LD out of the top of the loop.
for( i = 0; i < MAX; i++ ) b[i+1] = b[i] OP (some messy calculation);
gets converted into:
for( i=0, b0 = b[i]; i < MAX; i++ ) b0 = b[i+1] = b0 OP (some messy calculation);
I referred to an assembly code sample, for example if the code body includes 2 instructions, a load then a store (to some register), in back-to-back iterations the load should wait for the previous store, llvm-mca doesn’t imply this in its run, and the timeline view there actually shows that this is not happening.