[RFC] Enabling LoopVectorizer for vectorization width of 1

eopXD · October 6, 2022, 3:22pm

Hi all,

Currently the LoopVectorizer has a lack of functionality due to missing metadata of individual control to the “vectorization component” and “loop-interleave component” of LoopVectorizer. Please consider the following problem and the proposed approach.

Problem statement and goal

Current behavior for metadata

The LoopVectorizer (abbreviating the pass as LV in this RFC) performs vectorization and loop-interleave. Here we enumerate how LV behaves for the existing metadata.

To disable the whole LV:

{loop.vectorize.enable, false}

To disable vectorization of LV:

{loop.vectorize.width, 1}

To disable loop-interleave of LV:

{loop.interleave.width, 1}

To enable vectorization of LV:

{loop.vectorize.width, x} // where x != 1

To enable loop-interleave of LV:

{loop.interleave.width, x} // where x != 1

Problem of current behavior

Vectorization of width 1, although may be inefficient, should still be a possible specification for the LoopVectorizer. The LV cannot do this because it thinks that vectorize.width == 1 is to disable vectorization for LV.

Approach

This RFC wants to propose two new metadata loop.vectorization.enable and loop.interleave.enable that gives individual control to vectorization and loop-interleave in LV. The individual control of metadata solves the problem naturally since now LV can distinguish when the vectorization is disabled in another way.

We want the LLVM IR to be backward compatible, so the current behaviors stay “as-is”. The new metadata only provides an extra way to do what we already can now and additionally lets LV vectorize with width of 1.

To disable the whole LV:

// Current way
{loop.vectorize.enable, false}
// With this RFC
{loop.vectorization.enable, false}
{loop.vectorization.disable, false}

To disable vectorization of LV:

// Current way
{loop.vectorize.width, 1}
// With this RFC
{loop.vectorization.enable, false}

To disable loop-interleave of LV:

// Current way
{loop.interleave.width, 1}
// With this RFC
{loop.interleave.enable, false}

To enable vectorization of LV:

// Current way
{loop.vectorize.width, x} // where x != 1
// With this RFC
{loop.vectorization.enable, true}
{loop.vectorize.width, x} // x can be any non-negative integer

To enable loop-interleave of LV:

// Current way
{loop.interleave.width, x} // where x != 1
// With this RFC
{loop.interleave.enable, true}
{loop.interleave.width, x} // x can be any non-negative integer (but meaningless when x == 1)

To let LV vectorize with width 1:

// Current way
N/A
// With this RFC
{loop.vectorization.enable, true}
{loop.vectorize.width, 1}

Thank you for your time reading this letter, all comments are welcomed.

Regards,

eop Chen

fhahn · October 6, 2022, 4:06pm

Will this be the same behavior as when using #pragma clang loop vectorize(enable) vectorize_width(1)? If so, do we need a new metadata/pragma?

eopXD · October 6, 2022, 4:17pm

Will this be the same behavior as when using #pragma clang loop vectorize(enable) vectorize_width(1) ?

If we also want to consider compatibility too at pragma level, yes, I’m afraid we need a new pragma in this case.

eopXD · October 6, 2022, 5:36pm

Will this be the same behavior as when using #pragma clang loop vectorize(enable) vectorize_width(1) ? If so, do we need a new metadata/pragma?

If we also want to consider compatibility too at pragma level, yes, I’m afraid we need a new pragma in this case.

@fhahn
Considering what we have now, I think we can add pragma like vectorize_width_1 to allow the vectorizer to recognize vectorization of width 1.

The new pragma vectorize_width_1 is just like vectorize_width(x) when x is not 1. Here are the ways to invoke width 1 vectorization:

#pragma clang loop vectorize(enable) vectorize_width_1
#pragma clang loop vectorize_width_1

Invalid combinations are like the following:

vectorize_width and vectorize(disable) cannot co-exist

#pragma clang loop vectorize(disable) vectorize_width_1

vectorize_width and vectorize_width_1 cannot co-exist

#pragma clang loop vectorize(enable) vectorize_width(x) vectorize_width_1 // x is any non-negative number

Thank you for your time.

Regards,

eop Chen

mindong · October 11, 2022, 2:18am

Could you expand more on the motivation:)

eopXD · October 11, 2022, 11:01am

Vectorization of width 1, although may be inefficient, should still be a possible specification for the LoopVectorizer.

Could you expand more on the motivation:)

Width 1 is non-trivial for scalable vectorization, <vscale x 1>. Use case is yet to be explored but I think this RFC is fixing disability for the current LoopVectorizer.

fhahn · October 11, 2022, 1:59pm

eopXD:

Considering what we have now, I think we can add pragma like vectorize_width_1 to allow the vectorizer to recognize vectorization of width 1.

The new pragma vectorize_width_1 is just like vectorize_width(x) when x is not 1. Here are the ways to invoke width 1 vectorization:
#pragma clang loop vectorize(enable) vectorize_width_1
#pragma clang loop vectorize_width_1

I am not sure why the new vectorize_width_1 would be needed. Couldn’t the existing vectorize_width(x) be used?

eopXD · October 11, 2022, 5:56pm

#pragma clang loop vectorize(enable) vectorize_width_1
#pragma clang loop vectorize_width_1
I am not sure why the new vectorize_width_1 would be needed. Couldn’t the existing vectorize_width(x) be used?

We also want backward compatibility for the C level pragma. Currently vectorize_width(1) implies that vectorization is disabled in LoopVectorizer. Reusing the existing vectorize_width(x) will break compatibility.

eopXD · October 16, 2022, 1:57am

I think consideration we need to take in is whether we want to stay compatible in “C to LLVM IR” level.

I am refining the proposal and re-iterate the RFC again with a “Take 2” post.

Topic		Replies	Views
[RFC] Enabling LoopVectorizer for vectorization width of 1, Take 2 IR & Optimizations	4	656	December 1, 2022
loop vectorizer disabling LLVM Dev List Archives	4	239	September 11, 2019
Proposed changes to vectorize_width #pragma Clang Frontend	13	243	December 9, 2020
Adding Pragma Vectorize Clang Frontend	2	171	December 6, 2013
vectorize.enable LLVM Dev List Archives	12	276	October 7, 2019

[RFC] Enabling LoopVectorizer for vectorization width of 1

Problem statement and goal

Current behavior for metadata

Problem of current behavior

Approach

Related topics