Dear all,
Here's an updated version of the parallel loop metadata patch.
It includes documentation for the new metadata types with
a semantics description.
Hi Pekka,
I think this looks already very nice. Some more comments:
Index: llvm/include/llvm/Analysis/LoopInfo.h
--- llvm.orig/include/llvm/Analysis/LoopInfo.h 2013-01-29 23:40:09.480348774 +0200
+++ llvm/include/llvm/Analysis/LoopInfo.h 2013-01-31 16:13:16.517296071 +0200
@@ -381,6 +381,19 @@
/// isSafeToClone - Return true if the loop body is safe to clone in practice.
bool isSafeToClone() const;
+ /// isParallel - Returns true if the loop should be considered as
+ /// a "parallel loop" with freely scheduled iterations. A parallel loop can
+ /// be assumed to not contain any dependencies between iterations by the compiler.
+ /// That is, any loop-carried dependency checking can be skipped completely when
+ /// parallelizing the loop on the target machine. Thus, if the parallel loop
+ /// information originates from the programmer, e.g. via the OpenMP parallel
+ /// for pragma, it is the programmer's responsibility to ensure the are no
+ /// loop-carried dependencies. The final execution order of the instructions
+ /// across iterations is not guaranteed, thus, the end result might or might
might or might _not_
+ /// implement actual concurrent execution of instructions across multiple
+ /// iterations.
This comment is not formatted to our new doxygen style. LLVM now does not repeat the function name in the comment. Instead it has a brief comment followed by a full comment. Something like
/// Check if a loop is parallel
///
/// Returns true if the loop should be considered ....
+ bool isParallel() const;
+
/// hasDedicatedExits - Return true if no exit block for the loop
/// has a predecessor that is outside the loop.
Same here, do not repeat the function name.
bool hasDedicatedExits() const;
Index: llvm/lib/Analysis/LoopInfo.cpp
--- llvm.orig/lib/Analysis/LoopInfo.cpp 2013-01-29 23:40:12.164348629 +0200
+++ llvm/lib/Analysis/LoopInfo.cpp 2013-01-31 13:20:04.885692041 +0200
@@ -233,6 +233,31 @@
return true;
}
+
+bool Loop::isParallel() const {
+
+ BasicBlock *latch = getLoopLatch();
+ if (latch == NULL ||
+ latch->getTerminator()->getMetadata("llvm.loop.parallel") == NULL)
+ return false;
+
+ // The loop branch contains the parallel loop metadata. In order to ensure
+ // that any parallel-loop-unaware optimization pass hasn't added loop-carried
+ // dependencies (thus converted the loop back to a sequential loop), check
+ // that all the memory instructions in the loop contain parallelism metadata.
+ for (block_iterator i = block_begin(), e = block_end(); i != e; ++i) {
+ for (BasicBlock::iterator ii = (*i)->begin(), ee = (*i)->end();
+ ii != ee; ii++) {
In LLVM we normally use uppercase letters for iterators. 'II' and 'EE'.
+'``llvm.loop``'
+^^^^^^^^^^^^^^^
+
+It is sometimes useful to attach information to loop constructs. Currently,
+loop metadata is implemented as metadata attached to the branch instruction
+in the loop latch block. Loop-level metadata is prefixed with ``llvm.loop``.
+
+'``llvm.loop.parallel``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This loop metadata can be used to communicate that a loop should be considered
+a parallel loop. The semantics of parallel loops in this case is the one
+with the strongest cross-iteration instruction ordering freedom: the
+iterations in the loop can be considered completely independent of each
+other (also known as embarrasingly parallel loops).
embarrassingly
+
+This metadata can originate from a programming language with parallel loop
+constructs. In such a case it is completely the programmer's responsibility
+to ensure the instructions from the different iterations of the loop can be
+executed in an arbitrary order, in parallel, or intertwined. No loop-carried
+dependency checking at all must be expected from the compiler.
+
+In order to fulfil the LLVM requirement for metadata to be ignorable
fulfill
+safely, it is important to ensure that a parallel loop is converted to
+a sequential loop in case an optimization (unknowingly of the parallel loop
+semantics) converts the loop back to such. This happens when new memory
+accesses that do not fulfil the requirement of free ordering across iterations
fulfill
+are added to the loop. Therefore, this metadata is required, but not
+sufficient, to consider the loop at hand a parallel loop. In order to consider
+a loop a parallel loop, also all of its memory accessing instructions need to be
+marked with the ```llvm.mem.parallel_loop_access``` metadata.
+
+'``llvm.mem``'
+^^^^^^^^^^^^^^^
+
+Metadata types used to annotate memory accesses with information helpful
+for optimizations are prefixed with ``llvm.mem``.
+
+'``llvm.mem.parallel_loop_access``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In order to consider a loop a parallel loop, in addition to using
+the ``llvm.loop.parallel`` metadata to mark the loop latch branch instruction,
+also all of the memory accessing instructions in the loop body need to be
+marked with the ``llvm.mem.parallel_loop_access`` metadata. If there
+is at least one memory accessing instruction not marked with the metadata,
+the loop, despite it possibly using the ``llvm.loop.parallel`` metadata,
+must be considered a sequential loop. This causes parallel loops to be
+converted to sequential loops due to optimization passes that are unaware of
+the parallel semantics and that insert new memory instructions to the loop
+body.
+
+Example of a loop that is considered parallel due to its correct use of
+both ``llvm.loop.parallel`` and ```llvm.mem.parallel_loop_access```
+metadata types:
+
+.. code-block:: llvm
+
+ for.body:
+ %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
+ %arrayidx = getelementptr inbounds i32* %b, i64 %indvars.iv
+ %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
+ ...
+ store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
+ ...
+ br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !0
+ for.end: ; preds = %for.body
+ ret void
+ ...
+ !0 = metadata !{i32 1}
One point I am not entirely sure is: Is the parallel_loop_access meta-data somehow connected to the loop.parallel metadata. I think we need to also ensure that the parallel_loop_access metadata in a loop actually comes from the loop itself and was not introduced in some other unrelated way. I don't have a good example where this may happen, but could imagine stuff like inlining or licm from inner loops. I believe it should not be very difficult to couple the two, no?
Cheers
Tobi