RFC: [PATCH] parallel loop metadata

Hello all,

Thanks for the comments. Attached is a new version with
Tobias' and Sebastian's (final?) comments addressed. Any
further comments are appreciated.

Nadav suggested a request for comments in llvmdev before committing it.

In order to describe the current idea of the parallel loop metadata,
I think it's easiest to just copy-paste the documentation I wrote for
this patch so one can inline-reply to it with the possible comments:

+'``llvm.loop``'
+^^^^^^^^^^^^^^^

llvm-3.3-parallel-loop-metadata.patch (17.6 KB)

From: "Pekka Jääskeläinen" <pekka.jaaskelainen@tut.fi>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Nadav Rotem" <nrotem@apple.com>, "Tobias Grosser" <tobias@grosser.es>, "CVS Commit Messages for LLVM repository"
<llvm-commits@cs.uiuc.edu>, "Sebastian Pop" <spop@codeaurora.org>, "LLVM Developers Mailing List"
<llvmdev@cs.uiuc.edu>
Sent: Monday, February 4, 2013 12:18:09 PM
Subject: Re: [LLVMdev] RFC: [PATCH] parallel loop metadata

Hello all,

Thanks for the comments. Attached is a new version with
Tobias' and Sebastian's (final?) comments addressed. Any
further comments are appreciated.

Nadav suggested a request for comments in llvmdev before committing
it.

In order to describe the current idea of the parallel loop metadata,
I think it's easiest to just copy-paste the documentation I wrote for
this patch so one can inline-reply to it with the possible comments:

Perhaps, although it might be helpful to provide a little bit of context. Essentially, we should say upfront that 'parallel' here does not imply any particular set of parallel-execution semantics. Rather, this is simply a statement about the lack of inter-iteration dependencies.

+'``llvm.loop``'
+^^^^^^^^^^^^^^^
+
+It is sometimes useful to attach information to loop constructs.
Currently,
+loop metadata is implemented as metadata attached to the branch
instruction
+in the loop latch block. This type of metadata refer to a metadata
node that is
+guaranteed to be separate for each loop. The loop-level metadata is
prefixed
+with ``llvm.loop``.
+
+The loop identifier metadata is implemented using a metadata that
refers to
+itself as follows:
+
+.. code-block:: llvm
+ !0 = metadata !{ metadata !0 }
+
+'``llvm.loop.parallel``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This loop metadata can be used to communicate that a loop should be
considered
+a parallel loop. The semantics of parallel loops in this case is the
one
+with the strongest cross-iteration instruction ordering freedom: the
+iterations in the loop can be considered completely independent of
each
+other (also known as embarrassingly parallel loops).
+
+This metadata can originate from a programming language with
parallel loop
+constructs. In such a case it is completely the programmer's
responsibility
+to ensure the instructions from the different iterations of the loop
can be
+executed in an arbitrary order, in parallel, or intertwined. No
loop-carried
+dependency checking at all must be expected from the compiler.
+
+In order to fulfill the LLVM requirement for metadata to be safely
ignored,
+it is important to ensure that a parallel loop is converted to
+a sequential loop in case an optimization (agnostic of the parallel
loop
+semantics) converts the loop back to such. This happens when new
memory
+accesses that do not fulfill the requirement of free ordering across
iterations
+are added to the loop. Therefore, this metadata is required, but not
+sufficient, to consider the loop at hand a parallel loop. For a loop
+to be parallel, all its memory accessing instructions need to be
+marked with the ``llvm.mem.parallel_loop_access`` metadata that
refer
+to the same loop identifier metadata that identify the loop at hand.
+
+'``llvm.mem``'
+^^^^^^^^^^^^^^^
+
+Metadata types used to annotate memory accesses with information
helpful
+for optimizations are prefixed with ``llvm.mem``.
+
+'``llvm.mem.parallel_loop_access``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+For a loop to be parallel, in addition to using
+the ``llvm.loop.parallel`` metadata to mark the loop latch branch
instruction,
+also all of the memory accessing instructions in the loop body need
to be
+marked with the ``llvm.mem.parallel_loop_access`` metadata. If there
+is at least one memory accessing instruction not marked with the
metadata,
+the loop, despite it possibly using the ``llvm.loop.parallel``
metadata,
+must be considered a sequential loop. This causes parallel loops to
be
+converted to sequential loops due to optimization passes that are
unaware of
+the parallel semantics and that insert new memory instructions to
the loop
+body.
+
+Example of a loop that is considered parallel due to its correct use
of
+both ``llvm.loop.parallel`` and ``llvm.mem.parallel_loop_access``
+metadata types that refer to the same loop identifier metadata.
+
+.. code-block:: llvm
+
+ for.body:
+ ...
+ %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access
!0
+ ...
+ store i32 %0, i32* %arrayidx4, align 4,
!llvm.mem.parallel_loop_access !0
+ ...
+ br i1 %exitcond, label %for.end, label %for.body,
!llvm.loop.parallel !0
+
+ for.end:
+ ...
+ !0 = metadata !{ metadata !0 }
+
+It is also possible to have nested parallel loops. In that case the
+memory accesses refer to a list of loop identifier metadata nodes
instead of
+the loop identifier metadata node directly:
+
+.. code-block:: llvm
+
+ outer.for.body:
+ ...
+
+ inner.for.body:
+ ...
+ %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access
!0
+ ...
+ store i32 %0, i32* %arrayidx4, align 4,
!llvm.mem.parallel_loop_access !0
+ ...
+ br i1 %exitcond, label %inner.for.end, label %inner.for.body,
!llvm.loop.parallel !1
+
+ inner.for.end:
+ ...
+ %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access
!0
+ ...
+ store i32 %0, i32* %arrayidx4, align 4,
!llvm.mem.parallel_loop_access !0
+ ...
+ br i1 %exitcond, label %outer.for.end, label %outer.for.body,
!llvm.loop.parallel !2
+
+ outer.for.end: ; preds =
%for.body
+ ...
+ !0 = metadata !{ metadata !1, metadata !2 } ; a list of parallel
loop
identifiers
+ !1 = metadata !{ metadata !1 } ; an identifier for the inner
parallel loop
+ !2 = metadata !{ metadata !2 } ; an identifier for the outer
parallel loop

Nadav Rotem wrote:
>> Pekka,
>>
>> Can you please write llvm-dev and describe your parallel metadata
>> ? This
>> is a major change to LLVM and it has the potential to affect every
>> single
>> pass in LLVM. It should be reviewed by the community before we
>> discuss
>> the patch. Similar proposals were repelled in the past (remember
>> the
>> OpenMP "propeller" discussion ?).

I tried to participate in the beginning and then the discussion
somehow
died (and I got busy with something else).

IIRC the conclusion for OpenMP was that the thread libraries should
be
called directly in the frontend to implement thread level parallelism
for OpenMP programs, but was the destiny of the generic "parallelism
info"
metadata also decided to be a dead end? I didn't get that part.

Hal Finkel then replied:
> I agree with Nadav that community input on this proposal would be
> good.
> There are a number of passes that may need to be made aware of this
> metadata, and some consensus and planning is likely necessary for
> this to
> be successful. Nevertheless, this seems necessary for enabling
> generally
> useful features (like #pragma ivdep/parallel, etc.) and is worth
> the
> effort.

I agree. However, I think the proposed metadata is now well
"ignorable" (thanks to the Renato Golin's suggestion to mark the mem
accesses too). Sequential execution (to which it falls back to) is
a legal execution ordering of a fully parallel loop.

As soon as someone points out what is the true difference (basically
the set of guaranteed analysis done by the compiler) between "ivdep"
and
"parallel" then we can add a new metadata type, e.g., the
llvm.loop.ignore_assumed_deps to support that. Anyways, my desire is
to start
from something as it's easier to build on existing foundations.

As various people have expressed, "the set of guaranteed analysis done by the compiler" is the null set, and so this should suffice for these pragmas as well. Nevertheless, it is my opinion that we should run the dependence analysis anyway, if for no other reason than to warn the user of potential problems (but this can be considered an enhancement, not a prerequisite to the initial implementation).

> To clarify history, the reason that the metadata-based OpenMP
> schemes died
> was not due to the "propeller" arguments (which said that
> parallelization
> semantics are just not a good fit for LLVM), but rather due to
> fundamental
> correctness issues specific to OpenMP. Specifically, there are
> cases where
> it would not be legal to drop metadata that produced parallel
> regions, and
> a fundamental design principle of metadata is that dropping it must
> not
> cause miscompilation. It was decided to move ahead with frontend
> lowering
> under the assumption that the backend would either be provided
> with, or
> could itself deduce, the information necessary for later
> optimization where
> appropriate. This kind of metadata fits into that general plan.

OK, this was my view of how the OpenMP discussion ended as well. What
were the correctness issues in case of OpenMP, BTW?

Off of the top of my head, there were at least two issues:

1. num_threads could be specified, and this was not legally ignorable. The problem is that various runtime parameters can be queried using the runtime library, and those can be used to determine a legal number of threads to use. As a result, it is not legal to simply drop the parallel region.

2. There were similar issues with runtime scheduling; interactions between the runtime library and the pragmas could make dropping the pragmas illegal.

This proposal has no predefined associated semantics and it just used for dependency determination, and thus does not have these issues.

Thanks again,
Hal

Thanks Hal,

May I now commit this or does someone have something to add to the
discussion?

I think Nadav Rotem was somewhat suspicious. Tobias Grosser,
Sebastian Pop and Hal Finkel gave the patch a thumbs up.