[RFC] Enabling LoopVectorizer for vectorization width of 1

Hi all,

Currently the LoopVectorizer has a lack of functionality due to missing metadata of individual control to the “vectorization component” and “loop-interleave component” of LoopVectorizer. Please consider the following problem and the proposed approach.

Problem statement and goal

Current behavior for metadata

The LoopVectorizer (abbreviating the pass as LV in this RFC) performs vectorization and loop-interleave. Here we enumerate how LV behaves for the existing metadata.

To disable the whole LV:

{loop.vectorize.enable, false}

To disable vectorization of LV:

{loop.vectorize.width, 1}

To disable loop-interleave of LV:

{loop.interleave.width, 1}

To enable vectorization of LV:

{loop.vectorize.width, x} // where x != 1

To enable loop-interleave of LV:

{loop.interleave.width, x} // where x != 1

Problem of current behavior

Vectorization of width 1, although may be inefficient, should still be a possible specification for the LoopVectorizer. The LV cannot do this because it thinks that vectorize.width == 1 is to disable vectorization for LV.

Approach

This RFC wants to propose two new metadata loop.vectorization.enable and loop.interleave.enable that gives individual control to vectorization and loop-interleave in LV. The individual control of metadata solves the problem naturally since now LV can distinguish when the vectorization is disabled in another way.

We want the LLVM IR to be backward compatible, so the current behaviors stay “as-is”. The new metadata only provides an extra way to do what we already can now and additionally lets LV vectorize with width of 1.

To disable the whole LV:

// Current way
{loop.vectorize.enable, false}
// With this RFC
{loop.vectorization.enable, false}
{loop.vectorization.disable, false}

To disable vectorization of LV:

// Current way
{loop.vectorize.width, 1}
// With this RFC
{loop.vectorization.enable, false}

To disable loop-interleave of LV:

// Current way
{loop.interleave.width, 1}
// With this RFC
{loop.interleave.enable, false}

To enable vectorization of LV:

// Current way
{loop.vectorize.width, x} // where x != 1
// With this RFC
{loop.vectorization.enable, true}
{loop.vectorize.width, x} // x can be any non-negative integer

To enable loop-interleave of LV:

// Current way
{loop.interleave.width, x} // where x != 1
// With this RFC
{loop.interleave.enable, true}
{loop.interleave.width, x} // x can be any non-negative integer (but meaningless when x == 1)

To let LV vectorize with width 1:

// Current way
N/A
// With this RFC
{loop.vectorization.enable, true}
{loop.vectorize.width, 1}

Thank you for your time reading this letter, all comments are welcomed.

Regards,

eop Chen

Will this be the same behavior as when using #pragma clang loop vectorize(enable) vectorize_width(1)? If so, do we need a new metadata/pragma?

Will this be the same behavior as when using #pragma clang loop vectorize(enable) vectorize_width(1) ?

If we also want to consider compatibility too at pragma level, yes, I’m afraid we need a new pragma in this case.

Will this be the same behavior as when using #pragma clang loop vectorize(enable) vectorize_width(1) ? If so, do we need a new metadata/pragma?

If we also want to consider compatibility too at pragma level, yes, I’m afraid we need a new pragma in this case.

@fhahn
Considering what we have now, I think we can add pragma like vectorize_width_1 to allow the vectorizer to recognize vectorization of width 1.

The new pragma vectorize_width_1 is just like vectorize_width(x) when x is not 1. Here are the ways to invoke width 1 vectorization:

#pragma clang loop vectorize(enable) vectorize_width_1
#pragma clang loop vectorize_width_1

Invalid combinations are like the following:

  • vectorize_width and vectorize(disable) cannot co-exist

    #pragma clang loop vectorize(disable) vectorize_width_1
    
  • vectorize_width and vectorize_width_1 cannot co-exist

    #pragma clang loop vectorize(enable) vectorize_width(x) vectorize_width_1 // x is any non-negative number
    

Thank you for your time.

Regards,

eop Chen

Could you expand more on the motivation:)

Vectorization of width 1, although may be inefficient, should still be a possible specification for the LoopVectorizer.

Could you expand more on the motivation:)

Width 1 is non-trivial for scalable vectorization, <vscale x 1>. Use case is yet to be explored but I think this RFC is fixing disability for the current LoopVectorizer.

I am not sure why the new vectorize_width_1 would be needed. Couldn’t the existing vectorize_width(x) be used?

#pragma clang loop vectorize(enable) vectorize_width_1
#pragma clang loop vectorize_width_1

I am not sure why the new vectorize_width_1 would be needed. Couldn’t the existing vectorize_width(x) be used?

We also want backward compatibility for the C level pragma. Currently vectorize_width(1) implies that vectorization is disabled in LoopVectorizer. Reusing the existing vectorize_width(x) will break compatibility.

I think consideration we need to take in is whether we want to stay compatible in “C to LLVM IR” level.

I am refining the proposal and re-iterate the RFC again with a “Take 2” post.