Regarding BOF: Vectorization in LLVM

Hi Nadav,

Unfortunately I’m not attending the dev meeting, but the BoF looks interesting. One thing that I’d like to throw into the mix is that, while dealing with autovectorisation of LLVM compiled down from C-like languages (or maybe Fortran-like languages) is clearly a very big area for fruitful work both algorithmically and in terms of practical relevance, it’d also be interesting to see what LLVM complied from languages with semantics that are more open to optimization can do to indicate these things to the auto-vectorizer. (I have my own personal after-hours OSS project that will soon be generating LLVM IR for which vectorization will be important. I don’t want to implement vectorization myself before generating LLVM IR, partly since hopefully the LLVM layer will have a much better estimate of the costs/benefits, so having some written metadata I can set to indicate various things the vectorizer would otherwise deduce (non-aliasing, loop-permutability, etc) would be very interesting.)

Regards,
Dave Tweed

Hi David!

Hi Nadav,

Unfortunately I'm not attending the dev meeting, but the BoF looks interesting. One thing that I'd like to throw into the mix is that, while dealing with autovectorisation of LLVM compiled down from C-like languages (or maybe Fortran-like languages) is clearly a very big area for fruitful work both algorithmically and in terms of practical relevance, it'd also be interesting to see what LLVM complied from languages with semantics that are more open to optimization can do to indicate these things to the auto-vectorizer. (I have my own personal after-hours OSS project that will soon be generating LLVM IR for which vectorization will be important. I don't want to implement vectorization myself before generating LLVM IR, partly since hopefully the LLVM layer will have a much better estimate of the costs/benefits, so having some written metadata I can set to indicate various things the vectorizer would otherwise deduce (non-aliasing, loop-permutability, etc) would be very inter!

esting.)

This is definitely something that we plan to do. We would like to be able to annotate loops (using metadata or special intrinsics, etc) and to indicate that they are vectorizable. This is something that domain specific languages can use. This will also allow us to improve vectorization of C-based languages because the user will be able to tell the compiler that a loop is safe to vectorize. The Intel compiler already supports vectorizer pragmas to specify which loop should be vectorized and to what vectorization factor. I don't plan to start working vectorization hints soon, but this is one of the items on the vectorizer TODO list that I am going to present at the BoF.

Nadav

From: "Nadav Rotem" <nrotem@apple.com>
To: "David Tweed" <david.tweed@gmail.com>
Cc: llvmdev@cs.uiuc.edu
Sent: Tuesday, November 6, 2012 11:08:23 AM
Subject: Re: [LLVMdev] Regarding BOF: Vectorization in LLVM

Hi David!

> Hi Nadav,
>
> Unfortunately I'm not attending the dev meeting, but the BoF looks
> interesting. One thing that I'd like to throw into the mix is
> that, while dealing with autovectorisation of LLVM compiled down
> from C-like languages (or maybe Fortran-like languages) is clearly
> a very big area for fruitful work both algorithmically and in
> terms of practical relevance, it'd also be interesting to see what
> LLVM complied from languages with semantics that are more open to
> optimization can do to indicate these things to the
> auto-vectorizer. (I have my own personal after-hours OSS project
> that will soon be generating LLVM IR for which vectorization will
> be important. I don't want to implement vectorization myself
> before generating LLVM IR, partly since hopefully the LLVM layer
> will have a much better estimate of the costs/benefits, so having
> some written metadata I can set to indicate various things the
> vectorizer would otherwise deduce (non-aliasing,
> loop-permutability, etc) would be very inter!
esting.)

This is definitely something that we plan to do. We would like to be
able to annotate loops (using metadata or special intrinsics, etc)
and to indicate that they are vectorizable. This is something that
domain specific languages can use. This will also allow us to
improve vectorization of C-based languages because the user will be
able to tell the compiler that a loop is safe to vectorize. The
Intel compiler already supports vectorizer pragmas to specify which
loop should be vectorized and to what vectorization factor. I don't
plan to start working vectorization hints soon, but this is one of
the items on the vectorizer TODO list that I am going to present at
the BoF.

I'd like to add: Please develop a wish list of such things, and, if possible, patches.

Thanks again,
Hal

I'll certainly try to do this, although I suspect it's going to be
more of a matter of mutual interaction from both ends:

* DSL developers: I could tell you these things: ...., can you
profitably use them?
* Autovectorizer developers: I can make use of these things: ...., are
you in a position to put them directly into metadata?

The only real reason I mentioned the point was to make sure that
whatever metadata format is used becomes "a relatively stable
interface" rather than being regarded as an internal implementation
detail. (I know LLVM is quite dynamic in terms of refactorings and
expect things would change, but just being stable enough that using
them isn't more work than necessary.)

I’ll certainly try to do this, although I suspect it’s going to be
more of a matter of mutual interaction from both ends:

  • DSL developers: I could tell you these things: …, can you
    profitably use them?
  • Autovectorizer developers: I can make use of these things: …, are
    you in a position to put them directly into metadata?

I agree. This is something that we should discuss together and I definitely want this discussion to happen, but I don’t think that this will be a high priority for me in the next few months.

Also, I should mention that I would like us to implement the ICC vectorization intrinsics [1].

The only real reason I mentioned the point was to make sure that
whatever metadata format is used becomes “a relatively stable
interface” rather than being regarded as an internal implementation
detail. (I know LLVM is quite dynamic in terms of refactorings and
expect things would change, but just being stable enough that using
them isn’t more work than necessary.)

Maybe we can start the discussion by deciding if we want to use metadata or intrinsics.

[1] - http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/index.htm#GUID-B25ABCC2-BE6F-4599-AEDF-2434F4676E1B.htm

Also, I should mention that I would like us to implement the ICC
vectorization intrinsics [1].

I think it would be good to use the maximum information we need from
the developer for vectorization, but I'm not sure this would be the
first item on my agenda. :wink:

Though, I'm not against it now or any time. GCC supports it, doesn't it?

Maybe we can start the discussion by deciding if we want to use metadata or
intrinsics.

I had a good deal of thought about that and couldn't think of any but
the most naive vectorizations working without it. Not necessarily
metadata, but some kind of annotation, even if it would be discarded
after every pass.

Since a lot of the assumptions require you to probe different types of
relationship between variables and code, and since probes can normally
be expensive, memoizing intermediate state as metadata is important to
be efficient. As discussing earlier, we probably need IR versioning,
which is expensive, and the more we can save of these failed attempts
(that is still relevant), the better.

It'll not be easy to determine what stays and what can't, and how to
organize it in a way that is both simple and expressive, but if we can
have it so that other passes (such as Polly) can use it or the front
ends produce it (like #ivdep or #openmp). How does the other pragmas
that annotate code work?

The proposed openmp implementation works by wrapping code around
calling intrinsics. This is another way to annotate IR, but one that
might suffer a lot from code motion, inlining, etc. But at least, if
#openmp say that loop is private, even if it runs on a separate openMP
thread, we can still maybe vectorize it more efficiently (can't we?).

Just throwing ideas late at night, I might have said a load of carps...