[PATCH] parallel loop metadata

I agree. I was puzzled by it myself until I ended up concluding that
if the set of required analysis is not defined it can be an empty set,
thus equivalent to "no dep checking at all needed":

http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-January/058727.html

Maybe the safe thing here is to rename it back to the honest
"llvm.loop.parallel" or similar and we can add a separate one for
the assumed_dep later on. This one would support the truly parallel
loops (at least OpenMP for and OpenCL WIloops) where no compiler
checking at all can be assumed by the programmer.

Any objections? Paul Redmond?

From: "Pekka Jääskeläinen" <pekka.jaaskelainen@tut.fi>
To: "Dan Gohman" <dan433584@gmail.com>
Cc: "CVS Commit Messages for LLVM repository" <llvm-commits@cs.uiuc.edu>, "LLVM Developers Mailing List"
<llvmdev@cs.uiuc.edu>
Sent: Tuesday, January 29, 2013 1:42:27 PM
Subject: Re: [PATCH] parallel loop metadata

> "Ignore assumed dependencies" is shaky semantics. I haven't seen
> anything
> explicitly spell out which dependencies a compiler is guaranteed to
> detect.
> How can users use such a directive safely in a loop which has
> dependencies?
> I understand that this is what icc's documentation says, but I'm
> wondering
> if there's a real motivation for this design other than blind
> compatibility.

I agree. I was puzzled by it myself until I ended up concluding that
if the set of required analysis is not defined it can be an empty
set,
thus equivalent to "no dep checking at all needed":

http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-January/058727.html

Maybe the safe thing here is to rename it back to the honest
"llvm.loop.parallel" or similar and we can add a separate one for
the assumed_dep later on. This one would support the truly parallel
loops (at least OpenMP for and OpenCL WIloops) where no compiler
checking at all can be assumed by the programmer.

Will parallel always be synonymous with no_interiteration_dependencies? I'm sightly worried that 'parallel' seems too much like a directive, and we may want it to mean something else in the future.

-Hal

I think the semantics of a "parallel loop" is:

   If my loop, I hereby state as "parallel", has loop-carried dependencies,
   I have made a programming mistake and you, the compiler, are free to
   punish me by producing undefined results.

We can add other metadata later on. E.g. "ivdep":

   My loop might have some dependencies I know you know about, and also
   some dependencies which are impossible (for you!) to analyze. The latter
   ones you can ignore as I know those are not real dependencies, but
   please do not ignore those I know you know about :slight_smile:

I think the actual "strong" parallel loop semantics is more interesting
as it gives the most freedom and is not even a bit vague.

From: "Pekka Jääskeläinen" <pekka.jaaskelainen@tut.fi>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>, "Dan Gohman" <dan433584@gmail.com>
Sent: Tuesday, January 29, 2013 2:37:44 PM
Subject: Re: [PATCH] parallel loop metadata

> Will parallel always be synonymous with
> no_interiteration_dependencies? I'm
> sightly worried that 'parallel' seems too much like a directive,
> and we may
> want it to mean something else in the future.

I think the semantics of a "parallel loop" is:

   If my loop, I hereby state as "parallel", has loop-carried
   dependencies,
   I have made a programming mistake and you, the compiler, are free
   to
   punish me by producing undefined results.

We can add other metadata later on. E.g. "ivdep":

   My loop might have some dependencies I know you know about, and
   also
   some dependencies which are impossible (for you!) to analyze. The
   latter
   ones you can ignore as I know those are not real dependencies, but
   please do not ignore those I know you know about :slight_smile:

Unless there is also some way to tell the compiler about those dependencies that I know about, this this is useless. Based on my experience, when a user says "ivdep" they are asserting no dependencies (known or unknown).

-Hal

Hi Pekka,

Maybe the safe thing here is to rename it back to the honest
"llvm.loop.parallel" or similar and we can add a separate one for
the assumed_dep later on. This one would support the truly parallel
loops (at least OpenMP for and OpenCL WIloops) where no compiler
checking at all can be assumed by the programmer.

Any objections? Paul Redmond?

I don't have any objections. I think the only requirement is that the semantics are clearly defined.

Personally I think this metadata should be used to guide the vectorizer only. I'm not sure how it will be used in the context of OpenMP or OpenCL. For OpenMP I assume you'd add this metadata to parallel_for loops. At what point do you insert the runtime calls? Does LLVM need to know how to target each runtime?

paul

From: "Paul Redmond" <paul.redmond@intel.com>
To: "Pekka Jääskeläinen" <pekka.jaaskelainen@tut.fi>
Cc: "CVS Commit Messages for LLVM repository" <llvm-commits@cs.uiuc.edu>, "LLVM Developers Mailing List"
<llvmdev@cs.uiuc.edu>
Sent: Tuesday, January 29, 2013 3:27:03 PM
Subject: Re: [PATCH] parallel loop metadata

Hi Pekka,

> Maybe the safe thing here is to rename it back to the honest
> "llvm.loop.parallel" or similar and we can add a separate one for
> the assumed_dep later on. This one would support the truly parallel
> loops (at least OpenMP for and OpenCL WIloops) where no compiler
> checking at all can be assumed by the programmer.
>
> Any objections? Paul Redmond?
>

I don't have any objections. I think the only requirement is that the
semantics are clearly defined.

Personally I think this metadata should be used to guide the
vectorizer only. I'm not sure how it will be used in the context of
OpenMP or OpenCL. For OpenMP I assume you'd add this metadata to
parallel_for loops. At what point do you insert the runtime calls?
Does LLVM need to know how to target each runtime?

I think that the general direction being pursued for OpenMP is to do frontend lowering (outlining). However, we'd still want iteration-independence metadata to attach to the outlined loop bodies.

-Hal

Great.

I'm all for finally getting something into LLVM.. just need to make sure the name and the semantics are sorted out before the next release :slight_smile: This is a fine start.

paul

Hi,

In a personal capacity I'm quite interested in the issues of producing from a high-level language some LLVM IR which is labelled with vectorization info (including potentially actually reordering data in memory).

I don't have any objections. I think the only requirement is that the semantics are clearly defined.

I think that's very important :slight_smile:

Personally I think this metadata should be used to guide the vectorizer only. I'm not sure how it will be used in the context of OpenMP or OpenCL. For OpenMP I assume you'd add this metadata to
parallel_for loops. At what point do you insert the runtime calls? Does LLVM need to know how to target each runtime?

I think there's two use cases: where a human programmer has written an annotation (in some source language variant) where it represents a "best guess" which the compiler might decide is suboptimal and choose a variant of its own, and when the metadata has been added by a front-end compiler -- either to empirically observer the effect or because it's part of some more extensive performance directed code-reorganisation done by the front-end -- where it really ought to be "obeyed unless impossible". In particular, the LLVM backend is designed to be usable in routine production, which means that certain kinds of very expensive performance optimization will be unfeasible to be integrated. So I'd like at least a way to say "this is what I _really_ want the vectorizer to assume" in such metadata.

Regards,
Dave

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Thank you all for comments,

In a personal capacity I'm quite interested in the issues of producing from a
high-level language some LLVM IR which is labelled with vectorization info
(including potentially actually reordering data in memory).

> I don't have any objections. I think the only requirement is that the
semantics are clearly defined.

I think that's very important :slight_smile:

...

to be integrated. So I'd like at least a way to say "this is what I _really_
want the vectorizer to assume" in such metadata.

I agree. Thus, let's start with the 'parallel' semantics first, not
the vague ones.

A conclusion of some kind for me to update the patch:

1) Rename the loop branch metadata back to llvm.loop.parallel.

I'll add a definition of its semantics (as comments to Loop::isParallel()):

No loop-carried dependency checking at all can be assumed with this
type of loops. If the compiler does some analysis to improve the performance
or other reasons, it still can, but is not required to do so.

2) Keep the memory accessing instruction metadata as it is.

At least until something better appears.

The rationale is that when the parallel loop is marked this way (both a
branch metadata and memory instruction metadata required for isParallel()),
missing metadata in some of the memory instructions due to some
optimizations that do not propagate the data correctly does not cause
breakage (lead to assuming a now-sequential loop is still parallel).
It just potentially makes the parallel loop non-parallel which should be
safe.

In addition, the scheme of augmenting memory accesses with parallelism
info can be extended to retain the parallelism data over loop unrolling
too (useful for instruction scheduling). Also, e.g., retaining the restricted
pointer data across function inlining might be able to use a similar
mechanism.

We can improve the actual marking mechanism gradually because
the parallelism check will be wrapped inside Loop::isParallel().

Is this OK?

Hello Pekka,

I think it would be better to add a section to the LangRef:
http://llvm.org/docs/LangRef.html#metadata

A good comment would be helpful, too, but LangRef should be normative.

Dmitri

Dear all,

Here's an updated version of the parallel loop metadata patch.
It includes documentation for the new metadata types with
a semantics description.

parallel-loop-metadata.patch (12.7 KB)

Pekka Jääskeläinen wrote:

+'``llvm.loop``'
+^^^^^^^^^^^^^^^
+
+It is sometimes useful to attach information to loop constructs. Currently,
+loop metadata is implemented as metadata attached to the branch instruction
+in the loop latch block. Loop-level metadata is prefixed with ``llvm.loop``.
+
+'``llvm.loop.parallel``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This loop metadata can be used to communicate that a loop should be considered
+a parallel loop. The semantics of parallel loops in this case is the one
+with the strongest cross-iteration instruction ordering freedom: the
+iterations in the loop can be considered completely independent of each
+other (also known as embarrasingly parallel loops).
+
+This metadata can originate from a programming language with parallel loop
+constructs. In such a case it is completely the programmer's responsibility
+to ensure the instructions from the different iterations of the loop can be
+executed in an arbitrary order, in parallel, or intertwined. No loop-carried
+dependency checking at all must be expected from the compiler.
+
+In order to fulfil the LLVM requirement for metadata to be ignorable

I would avoid using "ignorable": just use "for metadata to be safely ignored".

+safely, it is important to ensure that a parallel loop is converted to
+a sequential loop in case an optimization (unknowingly of the parallel loop

"unknowingly" is an adverb, what you need here is an adjective qualifying "an
optimization", so I would use "agnostic".

+semantics) converts the loop back to such. This happens when new memory
+accesses that do not fulfil the requirement of free ordering across iterations
+are added to the loop. Therefore, this metadata is required, but not
+sufficient, to consider the loop at hand a parallel loop. In order to consider
+a loop a parallel loop, also all of its memory accessing instructions need to be

s/In order to consider a loop a parallel loop, also all of its memory accessing instructions/
For a loop to be parallel, all its memory accesses/

+marked with the ```llvm.mem.parallel_loop_access``` metadata.

Why do you use 3 backquotes instead of only 2? Here and below.

+
+'``llvm.mem``'
+^^^^^^^^^^^^^^^
+
+Metadata types used to annotate memory accesses with information helpful
+for optimizations are prefixed with ``llvm.mem``.
+
+'``llvm.mem.parallel_loop_access``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In order to consider a loop a parallel loop, in addition to using

s/In order to consider a loop a parallel loop/For a loop to be parallel/

+the ``llvm.loop.parallel`` metadata to mark the loop latch branch instruction,
+also all of the memory accessing instructions in the loop body need to be
+marked with the ``llvm.mem.parallel_loop_access`` metadata. If there
+is at least one memory accessing instruction not marked with the metadata,
+the loop, despite it possibly using the ``llvm.loop.parallel`` metadata,
+must be considered a sequential loop. This causes parallel loops to be
+converted to sequential loops due to optimization passes that are unaware of
+the parallel semantics and that insert new memory instructions to the loop
+body.
+
+Example of a loop that is considered parallel due to its correct use of
+both ``llvm.loop.parallel`` and ```llvm.mem.parallel_loop_access```
+metadata types:
+
+.. code-block:: llvm
+
+ for.body:
+ %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
+ %arrayidx = getelementptr inbounds i32* %b, i64 %indvars.iv
+ %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
+ ...
+ store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
+ ...
+ br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !0
+ for.end: ; preds = %for.body
+ ret void
+ ...
+ !0 = metadata !{i32 1}

Can you please also explain how global variables are handled?

Thanks,
Sebastian

Dear all,

Here's an updated version of the parallel loop metadata patch.
It includes documentation for the new metadata types with
a semantics description.

Hi Pekka,

I think this looks already very nice. Some more comments:

Index: llvm/include/llvm/Analysis/LoopInfo.h

--- llvm.orig/include/llvm/Analysis/LoopInfo.h 2013-01-29 23:40:09.480348774 +0200
+++ llvm/include/llvm/Analysis/LoopInfo.h 2013-01-31 16:13:16.517296071 +0200
@@ -381,6 +381,19 @@
    /// isSafeToClone - Return true if the loop body is safe to clone in practice.
    bool isSafeToClone() const;

+ /// isParallel - Returns true if the loop should be considered as
+ /// a "parallel loop" with freely scheduled iterations. A parallel loop can
+ /// be assumed to not contain any dependencies between iterations by the compiler.
+ /// That is, any loop-carried dependency checking can be skipped completely when
+ /// parallelizing the loop on the target machine. Thus, if the parallel loop
+ /// information originates from the programmer, e.g. via the OpenMP parallel
+ /// for pragma, it is the programmer's responsibility to ensure the are no
+ /// loop-carried dependencies. The final execution order of the instructions
+ /// across iterations is not guaranteed, thus, the end result might or might

          might or might _not_

+ /// implement actual concurrent execution of instructions across multiple
+ /// iterations.

This comment is not formatted to our new doxygen style. LLVM now does not repeat the function name in the comment. Instead it has a brief comment followed by a full comment. Something like

/// Check if a loop is parallel
///
/// Returns true if the loop should be considered ....

+ bool isParallel() const;
+
    /// hasDedicatedExits - Return true if no exit block for the loop
    /// has a predecessor that is outside the loop.

Same here, do not repeat the function name.

    bool hasDedicatedExits() const;
Index: llvm/lib/Analysis/LoopInfo.cpp

--- llvm.orig/lib/Analysis/LoopInfo.cpp 2013-01-29 23:40:12.164348629 +0200
+++ llvm/lib/Analysis/LoopInfo.cpp 2013-01-31 13:20:04.885692041 +0200
@@ -233,6 +233,31 @@
    return true;
  }

+
+bool Loop::isParallel() const {
+
+ BasicBlock *latch = getLoopLatch();
+ if (latch == NULL ||
+ latch->getTerminator()->getMetadata("llvm.loop.parallel") == NULL)
+ return false;
+
+ // The loop branch contains the parallel loop metadata. In order to ensure
+ // that any parallel-loop-unaware optimization pass hasn't added loop-carried
+ // dependencies (thus converted the loop back to a sequential loop), check
+ // that all the memory instructions in the loop contain parallelism metadata.
+ for (block_iterator i = block_begin(), e = block_end(); i != e; ++i) {
+ for (BasicBlock::iterator ii = (*i)->begin(), ee = (*i)->end();
+ ii != ee; ii++) {

In LLVM we normally use uppercase letters for iterators. 'II' and 'EE'.

+'``llvm.loop``'
+^^^^^^^^^^^^^^^
+
+It is sometimes useful to attach information to loop constructs. Currently,
+loop metadata is implemented as metadata attached to the branch instruction
+in the loop latch block. Loop-level metadata is prefixed with ``llvm.loop``.
+
+'``llvm.loop.parallel``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This loop metadata can be used to communicate that a loop should be considered
+a parallel loop. The semantics of parallel loops in this case is the one
+with the strongest cross-iteration instruction ordering freedom: the
+iterations in the loop can be considered completely independent of each
+other (also known as embarrasingly parallel loops).

                         embarrassingly

+
+This metadata can originate from a programming language with parallel loop
+constructs. In such a case it is completely the programmer's responsibility
+to ensure the instructions from the different iterations of the loop can be
+executed in an arbitrary order, in parallel, or intertwined. No loop-carried
+dependency checking at all must be expected from the compiler.
+
+In order to fulfil the LLVM requirement for metadata to be ignorable

                fulfill

+safely, it is important to ensure that a parallel loop is converted to
+a sequential loop in case an optimization (unknowingly of the parallel loop
+semantics) converts the loop back to such. This happens when new memory
+accesses that do not fulfil the requirement of free ordering across iterations

                         fulfill

+are added to the loop. Therefore, this metadata is required, but not
+sufficient, to consider the loop at hand a parallel loop. In order to consider
+a loop a parallel loop, also all of its memory accessing instructions need to be
+marked with the ```llvm.mem.parallel_loop_access``` metadata.
+
+'``llvm.mem``'
+^^^^^^^^^^^^^^^
+
+Metadata types used to annotate memory accesses with information helpful
+for optimizations are prefixed with ``llvm.mem``.
+
+'``llvm.mem.parallel_loop_access``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In order to consider a loop a parallel loop, in addition to using
+the ``llvm.loop.parallel`` metadata to mark the loop latch branch instruction,
+also all of the memory accessing instructions in the loop body need to be
+marked with the ``llvm.mem.parallel_loop_access`` metadata. If there
+is at least one memory accessing instruction not marked with the metadata,
+the loop, despite it possibly using the ``llvm.loop.parallel`` metadata,
+must be considered a sequential loop. This causes parallel loops to be
+converted to sequential loops due to optimization passes that are unaware of
+the parallel semantics and that insert new memory instructions to the loop
+body.
+
+Example of a loop that is considered parallel due to its correct use of
+both ``llvm.loop.parallel`` and ```llvm.mem.parallel_loop_access```
+metadata types:
+
+.. code-block:: llvm
+
+ for.body:
+ %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
+ %arrayidx = getelementptr inbounds i32* %b, i64 %indvars.iv
+ %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
+ ...
+ store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
+ ...
+ br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !0
+ for.end: ; preds = %for.body
+ ret void
+ ...
+ !0 = metadata !{i32 1}

One point I am not entirely sure is: Is the parallel_loop_access meta-data somehow connected to the loop.parallel metadata. I think we need to also ensure that the parallel_loop_access metadata in a loop actually comes from the loop itself and was not introduced in some other unrelated way. I don't have a good example where this may happen, but could imagine stuff like inlining or licm from inner loops. I believe it should not be very difficult to couple the two, no?

Cheers
Tobi