[RFC] Enabling LoopVectorizer for vectorization width of 1, Take 2

Hi all,

Through discussion in the previous thread [0], here is the second take of proposal to solve the problem.

CC-ing participants of previous thread: @fhahn, @mindong

Problem statement and goal

Current behavior

The LoopVectorizer (abbreviating the pass as LV in this RFC) performs vectorization and loop-interleave. Here we enumerate how LV behaves for the existing metadata.

C pragma level

To disable vectorization of LV:

#pragma clang loop vectorize(disable) // way 1
#pragma clang loop vectorize(enable) vectorize_width(1) // way 2

// Both ways maps to LLVM IR metadata
{loop.vectorize.width, i32 1}

To disable loop-interleave of LV:

#pragma clang loop interleave(disable) // way 1
#pragma clang loop interleave(enable) interleave_count(1) // way 2

// Both ways maps to LLVM IR metadata
{loop.interleave.count, i32 1}

LLVM IR level

To disable the whole LV:

{loop.vectorize.enable, false}

To disable vectorization of LV:

{loop.vectorize.width, 1}

To disable loop-interleave of LV:

{loop.interleave.width, 1}

To enable vectorization of LV:

{loop.vectorize.width, x} // where x != 1

To enable loop-interleave of LV:

{loop.interleave.width, x} // where x != 1

Problem of current behavior

I think this is not proposing to add new feature but rather fixing disability for the current LoopVectorizer.

Vectorization of width 1, although may be inefficient, should still be a possible specification for the LoopVectorizer. The LV cannot do this because it now considers vectorize.width == 1 as disabling vectorization for LV.

Approach

We want to maintain backward compatibility. So we cannot override existing behaviors for pragma or LLVM IR. Extra interface is proposed in this RFC to resolve the problem.

C pragma level

Add a new pragma pragma vectorize_width_1 for specifying vectorization of width 1. It will generate LLVM IR of

{loop.vectorization.enable, true}
{loop.vectorize.width, 1}

LLVM IR level

Add two new metadata loop.vectorization.enable and loop.interleave.enable that gives individual control to vectorization and loop-interleave in LV . loop.interleave.enable don’t need to be added (because loop-interleave of count 1 is equivalent to disabling it), but symmetry in functionality
for vectorization and loop-interleave in LV seems reasonable to me.

Listing the current behaviors, that will stay the same, with extra ways to achieve the same functionality and the ability to specify vectorization for width 1 in below.

To disable the whole LV:

// Current way
{loop.vectorize.enable, false}
// With this RFC
{loop.vectorization.enable, false}
{loop.vectorization.disable, false}

To disable vectorization of LV:

// Current way
{loop.vectorize.width, 1}
// With this RFC
{loop.vectorization.enable, false}

To disable loop-interleave of LV:

// Current way
{loop.interleave.width, 1}
// With this RFC
{loop.interleave.enable, false}

To enable vectorization of LV:

// Current way
{loop.vectorize.width, x} // where x != 1
// With this RFC
{loop.vectorization.enable, true}
{loop.vectorize.width, x} // x can be any non-negative integer

To enable loop-interleave of LV:

// Current way
{loop.interleave.width, x} // where x != 1
// With this RFC
{loop.interleave.enable, true}
{loop.interleave.width, x} // x can be any non-negative integer (but meaningless when x == 1)

*** To let LV vectorize with width 1:

// Current way
N/A
// With this RFC
{loop.vectorization.enable, true}
{loop.vectorize.width, 1}

Design choice: Refine behavior of existing C pragma-s

While the LLVM IR stays backward compatible, we can refine the C pragma-s’ we already have based on the new metadata added. This would break compatibility on C to LLVM level. So this is a choice we need to consider.

Plan A: Keep existing behaviors the same

Then we don’t need to add metadata loop.interleave.enable.

Pro-s: Compatibility
Con-s: Asymmetry in design of metadata.

Plan B: Break compatibility

Re-map behaviors of the C pragma-s according to new metadata added.

Pros-s: Readability in LLVM IR
Con-s: Breaks compatibility

// End of Proposal

Questions raised in previous thread

Will pragma vectorize_width_1 be the same behavior as when using #pragma clang loop vectorize(enable) vectorize_width(1) ?

No. We want to maintain backward compatibility, so we cannot override existing behavior for #pragma clang loop vectorize(enable) vectorize_width(1), which results in disabling the vectorizer.

Could you expand more on the motivation:)

Width 1 is non-trivial for scalable vectorization, <vscale x 1> . Use case is yet to be explored but I think this RFC is not proposing to add new feature but rather fixing disability for the current LoopVectorizer.

[0] [RFC] Enabling LoopVectorizer for vectorization width of 1

I think your proposal would be much easier to understand with an IR example and your expected results, it is not really clear to me what you mean by enabling LV for VF == 1.

Initially I thought you just wanted interleaving, but after reading your latest proposal, what your really want is to create vectors for VF == 1? IIUC this would only make sense for scalable vectors and I don’t see any scenario where this would be desirable for fixed vectors. This only became clear at the very end of your proposal.

For scalable vectors, arguably vectorizing with VF == 1 to create vectors like <vscale x 1 x ty> should already be doable with the existing pragma to enable scalable vectorization for a loop. It might be good to check with the people working on SVE what they think.

Vectorization of width 1, although may be inefficient, should still be a possible specification for the LoopVectorizer.

Can you provide more details on the motivation? In most interpretations, a scalar is a vector of length 1, i.e. vectorization is a no-op. Like @fhahn already asked, what are the benefits of still running LoopVectorize?

There is the real issue that vectorization and interleaving metadata is interlinked, e.g. one may want to only interleave, but not vectorize which I think currently is not possible. It would be great to disentangle those two logically separate transformations.

Yes Michael, my motivation is that vectorization and interleaving metadata is interlinked and I want to disentangle them. I think I kind of over-explained things a bit, and this is the essential issue I want to solve.

Does this make sense for me to continue this proposal? @Meinersbur @fhahn

Yes, I think it does make sense to split vectorize and interleave metadata. I think we could just start with having an interleave metadata corresponding to the vectorize metadata that more closely matches the #pragma clang loop modes.

That is we would have
llvm.loop.vectorize.enable
llvm.loop.vectorize.width
llvm.loop.isvectorized

llvm.loop.interleave.enable
llvm.loop.interleave.width
llvm.loop.isinterleaved

and make LoopVectorize consider them independently. Note that the breaking change can be avoided by using the AutoUpgrader to translate metadata from older versions. There might be an issue we do not necessarily know whether llvm.loop.vectorize.enable uses old or new semantics. In that case I think it would be acceptable to only consider the effect on vectorization, i.e. !{ !"llvm.loop.vectorize.enable", i1 false} would still allow interleaving even though historically it did switch it off.

I suggest to work on a draft patch that we can discuss rather than another RFC.