parallel loop metadata question

I’m looking at pocl and the loop based metadata llvm.loop and llvm.mem.parallel_loop_access, and am hoping someone familiar with it can help with my understanding.

First, I understand this to be true:

  • llvm.loop doesn’t by itself communicate anything. It is used by other metadata to anchor that metadata to the loop.

  • if all memory instructions in a loop have the llvm.mem.parallel_loop_access metadata referencing the loop’s llvm.loop metadata, then the loop can be considered ‘parallel’.

Here’s where I have a question:

  • if not all memory instructions in a loop have this metadata, can the compiler still infer that the instructions having this metadata do not have any loop carried dependencies with any other instruction also having this metadata (for the common loops referenced by the metadata)?

If so, it would be nice if we documented the actual meaning of llvm.mem.parallel_loop_access as such. And the conclusion that a loop is ‘parallel’ if all its memory instructions have this metadata would be a natural consequence of that meaning.

This will allow us to potentially vectorize and/or software pipeline large portions of loops even when some transformation introduces a memory operation whose dependence effects are unknown.

Thanks

Jon

This would be certainly useful.

We discussed a similar issue when the original parallel loop MD was
sketched last year:

http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-February/059256.html

In this case the question is whether we can be sure the (new) accesses
inserted between the parallel annotated accesses cannot not ruin
the parallel (independence) semantics between the annotated ones.

While I cannot give you a breaking example/optimization immediately,
I'm not confident there isn't one. The problem is that we need
to be extra careful to not break the metadata rules; if an
optimization does not understand a metadata, it should not be able
to accidentally break any additional semantics implied by it.

Thanks for the link. I understand your concern of caution with metadata. I cannot, though, imagine how the dependence relation (independence) of two memory references can be affected by a third memory reference. If two references are independent across loop iterations, then they are independent, and any other load or store cannot change that. Right?

Jon

Yes, it makes sense. I'm mostly concerned about accesses to stack,
but even those at this point should remain independent. Otherwise even
the current semantics might produce broken code with parallel stack accesses.

However, as this is such a major semantics change to the original one, I'd
like to hear more opinions on it. I suggest you create a (documentation)
patch where the new semantics is articulated and request comments for it at
the LLVM-commits list.

I agree with both. I think the extension is very reasonable and I also do not see a reason why this interpretation should cause troubles. However, to get it right it would be good to get this throughly reviewed.

Tobias

From: "Tobias Grosser" <tobias@grosser.es>
To: "Pekka Jääskeläinen" <pekka.jaaskelainen@tut.fi>, "Jonathan Humphreys" <j-humphreys@ti.com>, llvmdev@cs.uiuc.edu
Sent: Monday, May 5, 2014 3:36:07 AM
Subject: Re: [LLVMdev] parallel loop metadata question

>> Thanks for the link. I understand your concern of caution with
>> metadata.
>> I cannot, though, imagine how the dependence relation
>> (independence)
>> of two
>> memory references can be affected by a third memory reference. If
>> two
>> references are independent across loop iterations, then they are
>> independent, and any other load or store cannot change that.
>> Right?
>
> Yes, it makes sense. I'm mostly concerned about accesses to stack,
> but even those at this point should remain independent. Otherwise
> even
> the current semantics might produce broken code with parallel stack
> accesses.
>
> However, as this is such a major semantics change to the original
> one, I'd
> like to hear more opinions on it. I suggest you create a
> (documentation)
> patch where the new semantics is articulated and request comments
> for it at
> the LLVM-commits list.

I agree with both. I think the extension is very reasonable and I
also
do not see a reason why this interpretation should cause troubles.
However, to get it right it would be good to get this throughly
reviewed.

I agree, I think this sounds reasonable. You'll certainly need to be careful, however, that the associated instruction has not been hoisted/sunk out of the associated loops. Even if the load is one that can be speculated, that does not mean that there was not a control dependence on the independence information itself.

-Hal

Will do. I will write something up.

Hal, your concern below isn't so much with the proposed semantics but rather with the use - that optimizations must respect the loop for which the metadata applies, correct?

Thanks
Jon

From: "Jonathan Humphreys" <j-humphreys@ti.com>
To: "Hal Finkel" <hfinkel@anl.gov>, "Tobias Grosser" <tobias@grosser.es>
Cc: "Pekka Jääskeläinen" <pekka.jaaskelainen@tut.fi>, llvmdev@cs.uiuc.edu
Sent: Monday, May 5, 2014 5:09:42 PM
Subject: RE: [LLVMdev] parallel loop metadata question

Will do. I will write something up.

Hal, your concern below isn't so much with the proposed semantics but
rather with the use - that optimizations must respect the loop for
which the metadata applies, correct?

Yes, sounds right. Nevertheless, I would recommend putting such a cautionary note into the documentation itself just to make explicit an issue which might otherwise be overlooked.

-Hal

I propose that we change the first paragraph of http://llvm.org/docs/LangRef.html#llvm-mem-parallel-loop-access-metadata:

Hi,

This looks good to me except that the first sentence
could already include "that refer to the same loop" or
similar.

I could imagine that e.g. loop invariant code motion,
if applied to a parallel loop could hoist code out of
inner loops to outer (parallel) loops. Then the outer
loop contains parallel_loop_access instructions referring to
the inner loop, making the outer loop non-trivially parallel.

But these are probably rare cases as, at least in pocl, basic
optimizations have already been executed before the work-group
function generation where the parallel work-item loops are created.

Hi,

No further comments to this one? Should I just go ahead
with my minor modification and commit?

OK,

I updated the text to LangRef in r209507 after some
editing.

Pekka, thanks for updating this.

A small edit - the sentence ending with:

“with L1 and L2 being the set of loops associated with that metadata, respectively, then there is no loop carried dependence between m1 and m2 for loops L1 or L2.”

Should read:

“with L1 and L2 being the set of loops associated with that metadata, respectively, then there is no loop carried dependence between m1 and m2 for loops in both L1 and L2.”

Jon

Thanks, updated in r210327.