I'd love to see example cases where the pair analysis is the
difficulty, rather than the access analysis of any single memory piece
being the difficulty.
I'm not completely sure what you mean, but is there really a difference
between doing "pair analysis" across multiple iterations of the same
instruction than doing it with any other instruction?
Say, a loop with memory operations Aw, Br, Cw (w=write, r=read).
This is unrolled just for the sake of the example. The same applies to
widening the instructions in a loop vectorizer, similar dependency
analysis is needed.
i1: Aw Br Cw
i2: Aw' Br' Cw'
To do a legal code motion across the two iterations in such a way that
you move e.g., Aw' before Br, in parallel with Aw (e.g. to pack them to a
vector instruction or statically schedule in a VLIW) requires you to know
that none of Aw, Br nor Cw access the same location of Aw'.
If we know these are parallel loop accesses we know that Aw' doesn't
alias with any of Aw, Br or Cw, thus we can perform the code motion
with a peaceful mind, and execute Aw and Aw' in parallel e.g. in a
If we get some non-parallel-loop-annotated mem instructions injected in
the loop, it's up to the analyzer again to prove the code motion is legal.
Yes, that's exactly my point.
You said "This is an interesting alternative! Do you mean that we
would still add
the llvm.mem.parallel_loop_access metadata, but only to such mem accesses
that are assumed to be "hard or impossible to analyze" (to prove to be no
alias cases)? Then we'd forget about the "parallel loop metadata" as is.
Then we would rely on the regular loop carried dependency analyzer by
default, but let those (mem) annotations just *help* in the "tricky cases".
The llvm.mem.parallel_loop_access metadata would only communicate "this
instruction does not alias with any other similarly annotated instruction
from any other iteration in this loop"."
IE we wouldn't need to annotate every memory access in the loop, only
I don't see how this is possible, given the suggested semantics.
Let me give a motivating example myself, to explain what I mean.
load %w <--- A
store ..., %x <---B
load %y <---C
store ..., %z, <---D
Assume that only %w and %z are actually complicated access functions
that we could not prove anything about without metadata.
In practice, we are going to ask about pairs (D, C), (D, B), (D, A)
If "llvm.mem.parallel_loop_access" means that, per above, things
annotated with it do not conflict, in the above loop, you stil need to
annotate all of A, B, C, D with it to get any impact. It is *not*
sufficient to just annotate D, or even D and A, because the problem is
*D* is hard to analyze and *A* is hard to analyze, not that D,
<something> is hard to analyze. Without annotating everything, we
are still going to fail to know anything about D, C, even if we know
something about D, A.
So i don't see how you can simply annotate *certain* memory accesses,
and get any value out of it, given the semantics suggested.
For example, the reg2mem case (where Sw is produced by it and not
marked with the llvm.mem.parallel_loop_access MD).
i1: Aw, Sw, Br, Cw
i2: Aw', Sw', Br', Cw'
Cannot do the same code motion here unless the analyzer can prove Aw' and
Sw do not alias.
Yes, I agree, which is why i'm trying to understand how you don't end
up having to mark *every* memory in the loop with this instruction in
practice to get anywhere.