Proposed changes to vectorize_width #pragma

Hi,

At the moment the vectorize_width(X) #pragma is used to provide hints to LLVM

about which vectorisation factor to use. The unsigned argument ‘X’ used to match

the NumElements property in the VectorType class, however VectorType is now

defined in terms of a ElementCount class.

I’d like to propose an extension to the vectorize_width #pragma that now takes

an optional second parameter of ‘fixed’ or ‘scalable’ that matches up with

ElementCount. When not specified the default value would be ‘fixed’. A few

examples of how this would look like are shown below:

// Vectorize the loop with <4 x eltty>

#pragma clang loop vectorize_width(4)

#pragma clang loop vectorize_width(4, fixed)

// Vectorize the loop with <vscale x 4 x eltty>

#pragma clang loop vectorize_width(4, scalable)

As a further extension I’d also like to permit vectorize_width(fixed|scalable) to

allow users to hint at the type of vector used without specifying the

vectorisation factor. Examples of this would be:

// Vectorize the loop with for a profitable N

#pragma clang loop vectorize_width(fixed)

// Vectorize the loop with for a profitable N

#pragma clang loop vectorize_width(scalable)

Any thoughts you have would be much appreciated!

Kind Regards,

David Sherwood.

Hi David,

Thanks for bringing this up here. We have discussed this already on https://reviews.llvm.org/D89031 and a bit offline, and it would be good to get some other opinions on this too.

What we achieve with this extension is that we can toggle fixed/scalable vectorisation. The proposal is to add this property to vectorize_width, because it kind of defines the VectorType which consists of the elementcount and the scalable/fixed part, which sounds reasonable. However, there are other loop pragmas that (implicitly) enable vectorisation:

#pragma clang loop interleave_count(some-number)

or

#pragma clang loop vectorize_predicate(enable)

for which you may want to toggle fixed|scalable vectorisation. If this is correct, then I think the current proposal/implementation is incomplete and/or inconsistent.

I think your own suggestion was to introduce a vectorization_style(enable|disable) at some point, but my proposal would be to use that instead of adjusting vectorize_width as that would address the issue incompleteness/inconsistency issue. Besides this, but more subjective, I don’t see all the new combinations of vectorize_width() as making things clearer:

vectorize_width(VF)
vectorize_width(VF, fixed|scalable)

vectorize_width(fixed|scalable)

Probably the implementation of adding vectorization_style(enable|disable) is easier and less contentious than adjusting an existing one, so all together I don’t see why the approach of adjusting vectorize_wdith would be preferred. But I might be wrong, might be missing something, so welcome other views on this.

Cheers,
Sjoerd.

One typo fixed inline

Thanks for bringing this up here. We have discussed this already on https://reviews.llvm.org/D89031 and a bit offline, and it would be good to get some other opinions on this too.

What we achieve with this extension is that we can toggle fixed/scalable vectorisation. The proposal is to add this property to vectorize_width, because it kind of defines the VectorType which consists of the elementcount and the scalable/fixed part, which sounds reasonable. However, there are other loop pragmas that (implicitly) enable vectorisation:

#pragma clang loop interleave_count(some-number)

or

#pragma clang loop vectorize_predicate(enable)

for which you may want to toggle fixed|scalable vectorisation. If this is correct, then I think the current proposal/implementation is incomplete and/or inconsistent.

I think your own suggestion was to introduce a vectorization_style(enable|disable) at some point,

I meant vectorization_style(fixed|scalable)

Hi Sjoerd,

As I understand it the interleave count is orthogonal to the vectorization factor and

one does not imply the other. I think the clang documentation gives an example of

this:

#pragma clang loop vectorize_width(2)

#pragma clang loop interleave_count(2)

for(…) {

}

Also, I believe that each pragma that we set is a hint for one unit of the

loop vectorizer. It is true that vectorize_predicate enables vectorization, but

the vectorizer will always choose what it thinks is the most profitable

vectorization factor, which could be fixed or scalable. If you wanted to hint

to the compiler that we should use scalable vectors with my proposal you’d

simply add an extra pragma, i.e.

#clang loop vectorize_predicate(enable) vectorize_width(scalable)

Kind Regards,

David.

If you wanted to hint

to the compiler that we should use scalable vectors with my proposal you’d

simply add an extra pragma, i.e.

#clang loop vectorize_predicate(enable) vectorize_width(scalable)

Ah yes, that might have been the thing that I missed, but that would indeed then be equivalent with:

#clang loop vectorize_predicate(enable) vectorize_style(scalable)

I think that leaves us with 2 options that can express the same things, i.e. change or introduce:

vectorize_width(VF, fixed|scalable)

vectorize_width(fixed|scalable)
vectorize_width(VF)

vectorize_style(fixed|scalable)

And then it’s probably more of a style question and not that important if there are no implementation or usability issues overloading vectorize_width.

Cheers,
Sjoerd.

My feeling is this is not just a question of style but includes an element of design. Where possible we want to express vectorisation factors/element counts as a single unit, hence the proposal to extend vectorize_width as this is the unit of information that it controls.

Hi David,

Your proposal looks sensible to me. I understand that for reasons of evolution of the pragma, you chose to give it fixed semantics if no explicit mark of vectorisation style appears, right?

Is this something in the future we’d want to relax? This way the target could also pick the best vectorization style (borrowing Sjoerd’s terminology here).

Perhaps we could define a vectorize_style(any) as well. That would be the one used if no explicit vectorize_style is specified.

As a further extension I’d also like to permit vectorize_width(fixed|scalable) to

allow users to hint at the type of vector used without specifying the

vectorisation factor. Examples of this would be:

// Vectorize the loop with for a profitable N

#pragma clang loop vectorize_width(fixed)

// Vectorize the loop with for a profitable N

#pragma clang loop vectorize_width(scalable)

In those cases, I imagine vectorize_style could be enough and we avoid having a vectorize_width that doesn’t actually tell us the width (or the factor of the actual width, for scalables). But this falls in the “aesthetics” category, I think.

Kind regards,

Is "style" the right terminology? Since it affects semantics, I would
prefer some other terminology.

how about vectorize_scalable(enable|disable)?

Michael

If LoopVectorize is able to generate SVE without pragma, it should
still be able to do so with a hint that does not force a fixed vector
width. E.g. vectorize_predicate(enable) may implicitly enable
vectorization, but does (should not?) change the choses vector width.

An interpretation is that loop hint restrict the choices the
LoopVectorize's profitability heuristic can make. If the choices are

(interleave_count=1,vectorize_width=1) // .i.e. don't do anything
(interleave_count=1,vectorize_width=2)
(interleave_count=1,vectorize_width=4)
(interleave_count=2,vectorize_width=1)
(interleave_count=2,vectorize_width=2)
(interleave_count=2,vectorize_width=4)

then vectorize_width(4) only keeps

(interleave_count=1,vectorize_width=4)
(interleave_count=2,vectorize_width=4)

as available options. vectorize_enable(enable), or those that enable
vectorization implicitly, remove the vectorize_width=1 options from
the list.

Another proposal:

3)
vectorize_width(VF) // For fixed vector width.
vectorize_width_at_least(MinVF) // For SVE; alternatives:
vectorize_dynamic, vectorize_scalable.

What are the intended semantics? Does scalable mean "width of MinVF or
more", "any multiple of MinVF", "power-of-2 multiple of MinVF", "any
width of at least MinVF allowed by ARM's SVE"?

Michael

Hi,

So by adding support for scalable vectorisation widths we are effectively
updating the pragma to mirror the existing VectorType class in LLVM,
which is defined by a ElementCount and an element Type. The
ElementCount is a tuple consisting of a minimum number of elements
and a scalable flag. The meaning of 'scalable' as used in the vectorize_width
pragma as identical to that of ElementCount. Using one of my examples
in the initial proposal then this pragma

#pragma clang loop vectorize_width(4, scalable)

would mean the same in LLVM as a VectorType like this:

<vscale x 4 x eltty>

where eltty depends upon the types used in the loop. The 'vscale' parameter
is defined by the target - it is at least 1 and does not have to be a power of 2.

Kind Regards,
David.

I see the motivation, but there are different requirements for
LLVM-internals and user-facing extensions, which is why e.g. clang
does not implement a #pragma ivdep.

The definitions looks fine to me, as long as it is documented without
referring to compiler internals.

Michael

Hi Roger,

Thanks for the suggestion. With regards to possible use cases of a vectorize_style(any)

pragma my thoughts are:

  1. Any existing tests that currently use vectorize_width(#number) were presumably

written with fixed width vectorisation in mind. So it makes sense in those cases

for the default to be fixed width. If the user wants to go back and fix them to explicitly

use scalable vectorisation they can just add vectorize_width(#number, scalable). We

feel that specifying the numeric part of the vectorisation factor without also considering

if the factor is fixed-length or scalable is not a realistic/real world use case. I imagine

that best results will be obtained by letting the vectoriser choose the best pair, i.e.

vectorize_width(4, fixed) or vectorize_width(8, scalable).

  1. However, if the user wants the compiler to choose the best option (fixed or scalable)

then we already have a route for that with vectorize(enable). Similarly when compiling

at -O2 or above the compiler will choose the most profitable option.

Kind Regards,

David.

Hi David,

Thanks a lot for the clarification.

Defaulting to fixed vectorization and having a qualifier that restricts to fixed/scalable vectorization seems very reasonable to me in this context. I can see how a vectorize_style(any) would be unnecessary.

Kind regards,

Missatge de David Sherwood <David.Sherwood@arm.com> del dia dc., 9 de des. 2020 a les 13:49: