Proposal for ""llvm.mem.vectorize.safelen"

Julia and OpenMP 4.0 have features where the user can bless a loop as
having no memory dependences that prevent vectorization, thus enabling
vectorization of loops not amenable to compile-time or run-time
dependence analysis. LLVM currently has no metadata to express such,
as explained further below.

I'd like to propose new metadata that enables front-ends to tell the
vectorizer that "memory dependences are not a problem for vectorization
widths up to n". I'd appreciate any comments before I spend time
prototyping it.

BACKGROUND

Hi Arch,

That was the intention, yes, and I believe that was the exact
semantics we thought about. This metadata should be applied and kept
in the same way as other loop metadata.

Arnold or Nadav should know better, though, since they are up-to-date
with the current developments.

cheers,
--renato

Hello Arch,

I very much like the idea of such an annotation, especially since I was
looking for the same thing in the recent past. My use case is different
from yours, thus it might provide a second reason to support this.

I recently submitted a patch to the list [1] which would allow Polly to
extract the dependency distance for each analyzable loop. While the
distance is often not constant but parametric we would also need to
version the vectorized loop based on the actual runtime values. However,
the versioning doesn't need to be done by the vectorizer but could also
be part of Polly (depending on whether or not the vectorizer will have
that capability). I admit that we would need a good heuristic in order
to turn this feature on as a default optimization, but I will work on
that in the near future too.

Best regards,
  Johannes

[1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140804/230137.html

I recently submitted a patch to the list [1] which would allow Polly to
extract the dependency distance for each analyzable loop. While the
distance is often not constant but parametric we would also need to
version the vectorized loop based on the actual runtime values.

Since this is a metadata node, we can actually add it with whatever we
want, from an integer constant to whatever we want. But of course,
complexity is an issue. I believe a constant is a good starting point.

Remember that this annotation is saying that "the loop *as it is* is
safe in a N vector", but things can change between the annotation
(generally source code pragmas, but could be a Polly thing) and actual
vectorization. This is a game that the user must be ready to play to
use these advanced features and Polly has to be extremely conservative
(since the user is *not* directly recommending safety boundaries), but
can also allow some extra room (similar to -ffast-math, we could do
-ffast-polly if needed),

However,
the versioning doesn't need to be done by the vectorizer but could also
be part of Polly (depending on whether or not the vectorizer will have
that capability). I admit that we would need a good heuristic in order
to turn this feature on as a default optimization, but I will work on
that in the near future too.

The original idea for vectorization annotations was to help multiple
passes to communicate.

Initially, it was used between two passes of the vectorizer (so it
wouldn't vectorize the same loop again) but later was used as a
vehicle to source code pragmas. Though, the idea of Polly sharing the
annotation space was in it from the beginning, and I think we'll come
up with a lot more metadata than just wide/safe. Arnold had some
slides about it.

cheers,
--renato

> I recently submitted a patch to the list [1] which would allow Polly to
> extract the dependency distance for each analyzable loop. While the
> distance is often not constant but parametric we would also need to
> version the vectorized loop based on the actual runtime values.

Since this is a metadata node, we can actually add it with whatever we
want, from an integer constant to whatever we want. But of course,
complexity is an issue. I believe a constant is a good starting point.

Constants are particularly good since we don't need runtime checks to
utilize the information. (We might combine range information with
parametric bounds, but thats something for another day.)

Remember that this annotation is saying that "the loop *as it is* is
safe in a N vector", but things can change between the annotation
(generally source code pragmas, but could be a Polly thing) and actual
vectorization. This is a game that the user must be ready to play to
use these advanced features and Polly has to be extremely conservative
(since the user is *not* directly recommending safety boundaries), but
can also allow some extra room (similar to -ffast-math, we could do
-ffast-polly if needed),

That's true. If the vectorizer runs "directly" after Polly we are save
since the dependency distance we compute is exact, however I agree that we need
to be careful with other transformations invalidating the result.

We could actually generate (very simple) vector code ourself instead
of relying on the vectorizer. Furthermore, we could directly utilize the
heuristics and vector codegen of the vectorizer in Polly instead of
relying on the robustness of annotations. However, I think the
annotation way is easier to maintain (as we need it for source
annotations anyway).

> However,
> the versioning doesn't need to be done by the vectorizer but could also
> be part of Polly (depending on whether or not the vectorizer will have
> that capability). I admit that we would need a good heuristic in order
> to turn this feature on as a default optimization, but I will work on
> that in the near future too.

The original idea for vectorization annotations was to help multiple
passes to communicate.

Initially, it was used between two passes of the vectorizer (so it
wouldn't vectorize the same loop again) but later was used as a
vehicle to source code pragmas. Though, the idea of Polly sharing the
annotation space was in it from the beginning, and I think we'll come
up with a lot more metadata than just wide/safe. Arnold had some
slides about it.

Are these slides public?

Best regards,
  Johannes

In general, I think this is a useful addition. We just have to get the semantics nailed down.

Is this annotation - if present - meant as a restriction to accesses marked with “llvm.mem.parallel_loop_access”? - That there is no loop carried dependence at a |distance| < k but there might be one at >= k between marked accesses.

Thanks,
Arnold

Remember that this annotation is saying that "the loop *as it is* is
safe in a N vector", but things can change between the annotation
(generally source code pragmas, but could be a Polly thing) and actual
vectorization.

Because other transformations might introduce recurrences that break vectorization,
the "safelen" annotation probably should be structured more like
llvm.mem.parallel_loop_access, and be attached to every load/store of interest.

I'm imagining that the high level interface would be a method:

  int32_t Loop::GetAnnotatedVectorSafelen()

that would do the inspection, much as Loop::IsAnnotatedParallel() currently indicates
whether a loop is parallel. The return value would be 0 for unannotated loops
and an upper bound on the vector width otherwise.

- Arch

We could actually generate (very simple) vector code ourself instead
of relying on the vectorizer. Furthermore, we could directly utilize the
heuristics and vector codegen of the vectorizer in Polly instead of
relying on the robustness of annotations. However, I think the
annotation way is easier to maintain (as we need it for source
annotations anyway).

In our previous incarnations of such discussions, the general
consensus was that duplicating the vectorization machinery into Polly
(I believe there is already some) would be a waste of efforts. The
best course of action would be to annotate correctly (like Arch said,
on instructions not basic blocks) and get the vectorizer to do a sweep
scan on all interesting metadata and trim at the lowest common
denominator or whatever the semantics demands.

Are these slides public?

I don't remember. Arnold, do you still have those, or some scribblings
on that area?

cheers,
--renato

Hum, indeed, Polly and the vectorizer should be careful when
adding/changing annotations on loops that the user has already
annotated, or we run the risk of trying to be smarter than the user
and getting it wrong.

If I remember correctly, the safelen semantics was just a hint to the
validation that, despite its lack of knowledge, the loop was valid at
length N, so that we could skip directly to the cost model. But it
wasn't intended to force any particular width.

cheers,
--renato

We actually emit the llvm.mem.parallel_loop_access annotations already,
thus emitting the dependency distance (or save vectorization width) in a
similar fashion is no problem at all.

I don't think that will be a problem.

Polly, at least as is, doesn't use any user annotations, but computes the
dependency distance on its own (based on all memory accesses in the loop).
This should always be sound (we do not mix user annotations and the ones we
generate in any way). This is because Polly currently forgets all memory and
loop user annotations when generating the optimized loop nest. However,
the original loop nest, which might still be reachable (in case we
introduce runtime alias or delinearization checks), is not changed at all.

WHY CURRENT METADATA DOES NOT SUFFICE
-------------------------------------

There are currently two pieces of metadata that come close, but miss the
desired semantics.

* llvm.loop.vectorize.width - hints at what vectorization width to use
   *if* the loop is safe to vectorize. It does not specify that the
   loop is safe to vectorize.

* llvm.mem.parallel_loop_access - indicates that accesses do not
   have a loop-carried dependence between them. That's too broad a
   brush, as it precludes loops that do have dependences (e.g. "forward
   lexical dependences") that are nonetheless vectorizable.

How does this relate to the recent additions by Hal on invariants using llvm.assume? [0]

Can we translate llvm.mem.vectorize.safelen into an invariant on k similarly as to what you're proposing that the programmer should ensure?

Cheers,
  Roel

[0] http://comments.gmane.org/gmane.comp.compilers.llvm.devel/74941

From: "Roel Jordans" <r.jordans@tue.nl>
To: llvmdev@cs.uiuc.edu
Sent: Wednesday, August 13, 2014 5:57:15 AM
Subject: Re: [LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

>
> WHY CURRENT METADATA DOES NOT SUFFICE
> -------------------------------------
>
> There are currently two pieces of metadata that come close, but
> miss the
> desired semantics.
>
> * llvm.loop.vectorize.width - hints at what vectorization width to
> use
> *if* the loop is safe to vectorize. It does not specify that
> the
> loop is safe to vectorize.
>
> * llvm.mem.parallel_loop_access - indicates that accesses do not
> have a loop-carried dependence between them. That's too broad a
> brush, as it precludes loops that do have dependences (e.g.
> "forward
> lexical dependences") that are nonetheless vectorizable.
>

How does this relate to the recent additions by Hal on invariants
using
llvm.assume? [0]

I don't think this related because the assumptions don't provide any direct way of asserting things about memory aliasing. That might be an interesting thing to do, but we've not really thought about it yet.

Regarding the proposal, I'm in favor. I don't like using the name 'savelen' however. I can forgive OpenMP for choosing such a short name because people need to type it, but I'd prefer that the metadata have a more-descriptive name. minimum_dependency_distance is perhaps better.

-Hal

From: "Hal Finkel" <hfinkel@anl.gov>
To: "Roel Jordans" <r.jordans@tue.nl>
Cc: llvmdev@cs.uiuc.edu
Sent: Tuesday, August 19, 2014 5:57:54 PM
Subject: Re: [LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

> From: "Roel Jordans" <r.jordans@tue.nl>
> To: llvmdev@cs.uiuc.edu
> Sent: Wednesday, August 13, 2014 5:57:15 AM
> Subject: Re: [LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"
>
> >
> > WHY CURRENT METADATA DOES NOT SUFFICE
> > -------------------------------------
> >
> > There are currently two pieces of metadata that come close, but
> > miss the
> > desired semantics.
> >
> > * llvm.loop.vectorize.width - hints at what vectorization width
> > to
> > use
> > *if* the loop is safe to vectorize. It does not specify that
> > the
> > loop is safe to vectorize.
> >
> > * llvm.mem.parallel_loop_access - indicates that accesses do not
> > have a loop-carried dependence between them. That's too broad
> > a
> > brush, as it precludes loops that do have dependences (e.g.
> > "forward
> > lexical dependences") that are nonetheless vectorizable.
> >
>
> How does this relate to the recent additions by Hal on invariants
> using
> llvm.assume? [0]

I don't think this related because the assumptions don't provide any
direct way of asserting things about memory aliasing. That might be
an interesting thing to do, but we've not really thought about it
yet.

Regarding the proposal, I'm in favor. I don't like using the name
'savelen' however. I can forgive OpenMP for choosing such a short
name because people need to type it, but I'd prefer that the
metadata have a more-descriptive name. minimum_dependency_distance
is perhaps better.

Also, looking at the original proposal again, I see no reason to restrict this to an i32: we might as well allow it to be an i64 obviously we don't have billion-lane vectors, but the metadata can also be useful for other kinds of analysis).

I recommend that you send patches for an implementation (including the Loop::GetAnnotatedVectorSafelen function and associated updates to the vectorizer) for review. Make sure to remember LangRef updates!

-Hal

I recommend that you send patches for an implementation
(including the Loop::GetAnnotatedVectorSafelen function
and associated updates to the vectorizer) for review.

I expect to send the patches for review later this week.

Make sure to remember LangRef updates!

Thanks for the reminder.

Also, looking at the original proposal again, I see no reason
to restrict this to an i32: we might as well allow it to be an
i64 obviously we don't have billion-lane vectors, but the
metadata can also be useful for other kinds of analysis).

I doubt there is much advantage between knowing the minimum loop-carried
dependence difference is 1u<<31 or 1u<<63. Unless there is a clear
need for the higher values less than infinity, It would seem simpler
for now to adopt the same conventions as llvm.loop.vectorize.width
so that some processing infrastructure can be shared easily.

I don't like using the name 'safelen' however. I can forgive OpenMP
for choosing such a short name because people need to type it,
but I'd prefer that the metadata have a more-descriptive name.
minimum_dependency_distance is perhaps better.

I'm open to naming suggestions, though I'm wondering if sticking with
what is now a term of art in OpenMP might be the least of all evils,
because the semantics turn out to be a little more subtle than
just a minimum dependence distance. My current wording of the semantics
for safelen of k are:

  /// The loop can be assumed to have no loop-carried
  /// lexically backward dependencies with distance less than k.

- Arch D. Robison
  Intel Corporation

Now that I'm looking at editing LangRef, I have a question. The current
llvm.loop.vectorize metadata are hints, and have this constraint:

   The``llvm.loop.vectorize`` and ``llvm.loop.interleave`` metadata are only
   optimization hints and the optimizer will only interleave and vectorize loops
   if it believes it is safe to do so.

But safelen is different. It's an assertion about safety, so prefixing it with
llvm.loop.vectorize seems inappropriate. Does is sound reasonable to drop
the "vectorize" portion. Maybe spell it something like this?

  llvm.loop.carried_dependence_distance.min

- Arch

I recommend that you send patches for an implementation
(including the Loop::GetAnnotatedVectorSafelen function
and associated updates to the vectorizer) for review.

I expect to send the patches for review later this week.

Make sure to remember LangRef updates!

Thanks for the reminder.

Also, looking at the original proposal again, I see no reason
to restrict this to an i32: we might as well allow it to be an
i64 obviously we don't have billion-lane vectors, but the
metadata can also be useful for other kinds of analysis).

I doubt there is much advantage between knowing the minimum loop-carried
dependence difference is 1u<<31 or 1u<<63. Unless there is a clear
need for the higher values less than infinity, It would seem simpler
for now to adopt the same conventions as llvm.loop.vectorize.width
so that some processing infrastructure can be shared easily.

I don't like using the name 'safelen' however. I can forgive OpenMP
for choosing such a short name because people need to type it,
but I'd prefer that the metadata have a more-descriptive name.
minimum_dependency_distance is perhaps better.

I'm open to naming suggestions, though I'm wondering if sticking with
what is now a term of art in OpenMP might be the least of all evils,
because the semantics turn out to be a little more subtle than
just a minimum dependence distance. My current wording of the semantics
for safelen of k are:

/// The loop can be assumed to have no loop-carried
/// lexically backward dependencies with distance less than k.

This means you would allow lexically forward dependencies which complicates things (this would need more infrastructure in clang). You need to carry over “source order” from the front-end. Because the dependency is loop carried llvm would be allowed to move loads and stores within the loop:

Lexical forward dependency:

#pragma vectorize safelen(4)
for (i = …) {
   a[i] = b[i] + z[i];
   c[i] = a[i-1] + 1;
}

LLVM might tranform this loop to:

for (i = …) {
   c[i] = a[i-1] + 1;
   a[i] = b[i] + z[i];
}

if we now vectorize this (without knowledge of the orignal source order) we get the wrong answer:

for (i = …) {
   c[i] = a[i-1:i+2] + 1;
   a[i:i+3] = b[i] + z[i];
}

Alternatively, the metadata loop.vectorize.safelen would have to prevent any such reordering of accesses i.e. any pass that reorders memory accesses would have to be aware of it which is fragile.

This was true to all *current* vectorizer pragmas, but certainly not
safelen. I think we can change that description instead of changing
the namespace.

cheers,
--renato

Thanks for alerting me to this issue. This is going to take more
effort than I thought :-(. Given that a major motivation behind
the OpenMP #pragma omp simd is to allow lexical forward dependences,
we should find a way to do this right.

I agree that we want to avoid making passes that reorder accesses fragile.
The extra work in Clang (or Julia :slight_smile: seems unavoidable, unless those
producers always emit LLVM instructions in lexical order. If the latter
is the case, perhaps we could have a helper routine that, given the IR early
and in lexical order, finishes the annotation work?

It seems that we need metadata on each memory access, distinct from
llvm.mem.parallel_loop_access. Say something like llvm.mem.vector_loop_access
that includes relative lexical position as a third operand? Second operand
could point back to the metadata with the minimum loop-carried distance,
which in turn could point back to the loop id.

- Arch

>
>> I recommend that you send patches for an implementation
>> (including the Loop::GetAnnotatedVectorSafelen function
>> and associated updates to the vectorizer) for review.
>
> I expect to send the patches for review later this week.
>
>> Make sure to remember LangRef updates!
>
> Thanks for the reminder.
>
>> Also, looking at the original proposal again, I see no reason
>> to restrict this to an i32: we might as well allow it to be an
>> i64 obviously we don't have billion-lane vectors, but the
>> metadata can also be useful for other kinds of analysis).
>
> I doubt there is much advantage between knowing the minimum loop-carried
> dependence difference is 1u<<31 or 1u<<63. Unless there is a clear
> need for the higher values less than infinity, It would seem simpler
> for now to adopt the same conventions as llvm.loop.vectorize.width
> so that some processing infrastructure can be shared easily.
>
>> I don't like using the name 'safelen' however. I can forgive OpenMP
>> for choosing such a short name because people need to type it,
>> but I'd prefer that the metadata have a more-descriptive name.
>> minimum_dependency_distance is perhaps better.
>
> I'm open to naming suggestions, though I'm wondering if sticking with
> what is now a term of art in OpenMP might be the least of all evils,
> because the semantics turn out to be a little more subtle than
> just a minimum dependence distance. My current wording of the semantics
> for safelen of k are:
>
> /// The loop can be assumed to have no loop-carried
> /// lexically backward dependencies with distance less than k.

This means you would allow lexically forward dependencies which complicates things (this would need more infrastructure in clang). You need to carry over “source order” from the front-end. Because the dependency is loop carried llvm would be allowed to move loads and stores within the loop:

This should not be that hard (see below).

Lexical forward dependency:

#pragma vectorize safelen(4)
for (i = …) {
   a[i] = b[i] + z[i];
   c[i] = a[i-1] + 1;
}

LLVM might tranform this loop to:

for (i = …) {
   c[i] = a[i-1] + 1;
   a[i] = b[i] + z[i];
}

if we now vectorize this (without knowledge of the orignal source order) we get the wrong answer:

for (i = …) {
   c[i] = a[i-1:i+2] + 1;
   a[i:i+3] = b[i] + z[i];
}

Alternatively, the metadata loop.vectorize.safelen would have to prevent any such reordering of accesses i.e. any pass that reorders memory accesses would have to be aware of it which is fragile.

Could we number the memory accesses for such loops (e.g., in clang)?
Adding metadata to each memory access which points to the safelen
annotation and contains an increasing constant would be similarly
fragile as other constructions we use at the moment. However, it would
allow us to determine iff the current order is still the original one or
not. (At least if I did not miss anything).

Best regards,
  Johannes