[RFC] Should We Restrict the Usage of 0-D Vectors in the Vector Dialect?

dcaballe · December 11, 2024, 10:04pm

Thanks for all the feedback! I want to highlight that this approach is not intending to leave any project behind or concern unaddressed, of course. My previous comment was about ensuring that not only 0-D tensors but also 0-D vectors outside the Vector dialect can be efficiently lowered to the Vector dialect.

nicolasvasilache:


    bool isSingleElem1DNonIndexedVec = 

        (nonIndexedRank == 1 && nonIndexedVecType.getDimSize(0) == 1); 

    bool isSingleElem1DIndexedVec = 

        (indexedRank == 1 && indexedType.getDimSize(0) == 1); 

    // Verify 0-D -> single-element 1-D supported cases. 

    if ((indexedRank == 0 && isSingleElem1DNonIndexedVec) || 

        (nonIndexedRank == 0 && isSingleElem1DIndexedVec)) { 

      return op->emitOpError("expected source and destination vectors with " 

                             "different number of elements"); 

    } 

    // Verify indices for all the cases. 

    int64_t indexedRankMinusIndices = indexedRank - numIndices; 

    if (indexedRankMinusIndices != nonIndexedRank && 

        (!isSingleElem1DNonIndexedVec || indexedRankMinusIndices != 0)) { 

      return op->emitOpError() 

             << "expected " << indexedStr 

             << " vector rank minus number of indices to match the rank of the " 

             << nonIndexedStr << " vector"; 

    }

This is the problem displacement I am talking about, that I suspect will propagate everywhere.

This is actually the churn we need to introduce today to handle 0-D vectors and it spreads all over the Vector dialect, unfortunately. I see a difference between a precise (though perhaps less flexible) representation and an ambiguous one. The former can lead to slightly suboptimal (~canonicalization) IR but concrete and mechanical transformations. The latter requires making a call to resolve every ambiguous point in the representation in one way or the other and managing the combinatorial explosion of the fact that every ambiguous point can have a different resolution. As showcased, examples of this are vector.extract and vector.insert, with multiple ambiguous points within the same operation.

Thanks for elaborating on this. It’s great to better understand different mental models. IMO, only Category 1 should exist since it’s a super-set of the others. Progressive lowering allows us to refine and constraint the IR to incrementally align with the next level of abstraction/egress dialect. We also can’t exclude multi-dimensional Vectors at any vector level, as some architectures natively support them already.

A bit of history that may help understand the current state: operations in Category 3 were introduced out of necessity when there was a gap in the representation and we used LLVM as a reference to quickly fill that gap. Later, when generalization was needed, we introduced new ops for pathfinding and building expertise without disrupting the “stable” operations. This was the case with vector.insertelement/extractelement / vector.insert/extract, but there are others. We have been working on improving this situation and we should continue to do so.

A quick comment here: I think this needs to be revisited. We can’t drop 0-D support from the operations that allow transitions into and out of 0-D vectors. This should be the final step, as we need them to proceed incrementally while preserving stability. We should first strengthen the scalar/vector support in “boundary” operations to ensure they properly cover all the necessary cases. Then, we can start with simple elementwise operations and build the support that frameworks need for them. Next, we can tackle more involved operations like reductions. And, finally, we can remove 0-D support from the “boundary” operations.

Groverkss · December 12, 2024, 2:52pm

Agreed. We are trying to solve the same problem, but have different solutions.

Hm, if Category 1 allows 0-D vectors and Category 2 does not allow 0-D vectors, shouldn’t this statement be reversed?

No, let me explain my reasoning.

An operation in Category can act as an operation in another Category if you can put a wrapper around it that makes it act as part of the new Category.

Example of a Category 1 operation acting as a Category 2 operation:

// Defined in N-D vector space
vector.multi_reduction : (vector<2x2xf32>) -> (vector<32>)

// Defined for stack of 1-D vectors
wrapper {
  %result  = vector.multi_reduction : (vector<2x2xf32>) -> (vector<f32>)
  // 0-D vectors do not make sense for stack of 1-D vectors, use scalars.
  // vector<1xf32> does not make sense dimensionality wise.
  // Only possible value to use is a scalar.
  %scalar = vector.extract_scalar : (vector<f32>) -> f32
  yield %scalar
} (vector<2x2xf32> -> f32)

Example of Category 2 operation wrongly acting as a Category 1 operation:

// Defined for stack of 1-D vectors.
// Note that this operation preserves dimensionality for stack of 1-D vectors.
vector.shuffle (vector<1xf32>, vector<1xf32>) -> (vector<2xf32>)

// Cannot be defined for 0-D vectors, which are valid N-D vector space inputs.
wrapper {
  // inputs need to be broadcasted, because vector.shuffle
  // should not support 0-D inputs if it is defined in stack of 1-D space.
  %b_arg1 = vector.broadcast (vector<f32>) -> (vector<1xf32>)
  %b_arg2 = vector.broadcast (vector<f32>) -> (vector<1xf32>)
  %result = vector.shuffle %b_arg1, %b_arg2 : (vector<1xf32>, vector<1xf32>) -> (vector<2xf32>)
  yield %result
} (vector<f32>, vector<f32>) -> (vector<2xf32>)
// Not dimensionality preserving.
// But vector.shuffle is dimensionality preserving for other vectors :/

I hope this clears up my reasoning.

Put differently, I am still hoping that we can re-use vector.insert/vector.extract. If we discover otherwise, then we can just introduce vector.extract_scalar as you proposed.

I think this is where we disagree. We are just moving the problem elsewhere. Let’s say we restrict this vector.insert to not allow 0-D vector insertions.

// 1-D vector
vector.insert %input, %vec[0] : vector<1xf32> into vector<1x1xf32>

// 0-D vector
vector.extract %vec : (vector<f32> -> f32)
vector.insert %input, %vec[0, 0] : f32 into vector<1x1xf32>

// scalar
vector.insert %input, %vec[0, 0] : f32 into vector<1x1xf32>

Here, the insertion type is dependent on the number of indices and the rank of the destination type. This operation also does not belong to either Category 1 or Category 2, it allows 0-D destination vectors, but the insertion type can either be a scalar or a vector.

Instead of the operation allowing the user to input anything and having problems in the operation, we are instead moving it to the user, who has to now check what the input type and ask the user to change their input accordingly. (Note that this also restricts folding).

// 1-D vector
vector.insert %input, %vec[0] : vector<1xf32> into vector<1x1xf32>

// 0-D vector
vector.insert %input, %vec[0, 0] : vector<f32> into vector<1x1xf32>

// scalar
vector.insert_scalar %input, %vec[0, 0] : f32 into vector<1x1xf32>

Here, the insertion type is only dependent on the operation. Bothe insertion type is only dependent on the operation. Both of these operations are clearly defined th of these operations are clearly defined in Category 1.

The later choice is what memory operations also take (vector.transfer_write for vectors and memref.store/tensor.store for scalars) and what tensor dialect also takes (tensor.insert_slice and tensor.insert).

It might look like adding another operation will make us duplicate our folders/canonicalizers/patterns, but our canonicalizers/folders are already making thinking of these as two different paths::

llvm-project/mlir/lib/Dialect/Vector/IR/VectorOps.cpp at main · llvm/llvm-project · GitHub (Only works for vector types)

llvm-project/mlir/lib/Dialect/Vector/IR/VectorOps.cpp at main · llvm/llvm-project · GitHub (SplatOp only returns vector types, so no scalar path)

llvm-project/mlir/lib/Dialect/Vector/IR/VectorOps.cpp at main · llvm/llvm-project · GitHub,
llvm-project/mlir/lib/Dialect/Vector/IR/VectorOps.cpp at main · llvm/llvm-project · GitHub
(ExtractOp → Broadcast folder special casing and taking different paths for scalars)

There are other folders which are really only valid for vectors because the folding for scalars is trivial, but they still have to special case each time.

From downstream experience, transformations on vector.extract + Category 1 operation usually needs different paths, based on if it’s returning a scalar or a vector (because scalars need different operations compared to vectors).

kuhar · December 12, 2024, 3:01pm

FWIW, now that vector.extract reached parity with vector.extractelement, we could restrict vector.extractelement to serve the scalar usecase only – the name is almost perfect already.

banach-space · December 13, 2024, 9:16am

(Replying to Kunwar’s post above - mostly finer details)

Thank you for the clarifications!

I see that I misunderstood your taxonomy - apologies for that. If I understand correctly, your proposal:

Allows 0-D vectors in Category 1 and (selectively) in Category 3, but
Bans them in Category 2.

To me, this inconsistency is quite confusing. Additionally:

I’m against this unless you can point to an intrinsic requiring such flexibility. To my knowledge, neither LLVM nor SPIR-V supports 0-D vectors. Reducing special-casing at this point seems like the right direction.

Groverkss:

// 1-D vector

vector.insert %input, %vec[0] : vector<1xf32> into vector<1x1xf32>

// 0-D vector

vector.extract %vec : (vector<f32> -> f32)

vector.insert %input, %vec[0, 0] : f32 into vector<1x1xf32>

// scalar

vector.insert %input, %vec[0, 0] : f32 into vector<1x1xf32>
Here, the insertion type is dependent on the number of indices and the rank of the destination type. This operation also does not belong to either Category 1 or Category 2, it allows 0-D destination vectors, but the insertion type can either be a scalar or a vector.

(emphasis added by me)

I disagree with this interpretation. The insertion type is uniquely determined by two quantities:

The “number of indices,” and
The “destination rank.”

When the “number of indices” equals the “destination rank,” the value to be inserted is always a scalar. There’s no ambiguity here.

You’re correct that banning scalars from vector.extract/vector.insert would ensure consistency by requiring all arguments to be vectors. However, this would necessitate introducing new Ops to mix vector and scalar arguments, which comes with significant costs. Moreover, we will always need some Ops that handle both scalar and vector arguments. Shifting the issues from one place to another doesn’t solve the underlying problem - it just redistributes complexity.

While I agree that avoiding special-casing is ideal, I think it’s unrealistic to eliminate it entirely. Whether an approach results in “more” or “less” special-casing is subjective and depends on the specific domain. Quantifying this trade-off would help, but I don’t believe it’s feasible to avoid all special cases, even with Vector.

Groverkss:

Example of Category 2 operation wrongly acting as a Category 1 operation:

// Defined for stack of 1-D vectors.
// Note that this operation preserves dimensionality for stack of 1-D vectors.
vector.shuffle (vector<1xf32>, vector<1xf32>) -> (vector<2xf32>)

// Cannot be defined for 0-D vectors, which are valid N-D vector space inputs.
wrapper {
  // inputs need to be broadcasted, because vector.shuffle
  // should not support 0-D inputs if it is defined in stack of 1-D space.
  %b_arg1 = vector.broadcast (vector<f32>) -> (vector<1xf32>)
  %b_arg2 = vector.broadcast (vector<f32>) -> (vector<1xf32>)
  %result = vector.shuffle %b_arg1, %b_arg2 : (vector<1xf32>, vector<1xf32>) -> (vector<2xf32>)
  yield %result
} (vector<f32>, vector<f32>) -> (vector<2xf32>)
// Not dimensionality preserving.
// But vector.shuffle is dimensionality preserving for other vectors :/

Your example seems to demonstrate that this approach works as intended. Could you clarify what is problematic here? Right now, this feels like a heavy-handed approach to addressing 0-D vectors.

Finally, @Groverkss, going through your proposal, it seems that the overall idea is to replace scalars (f32) with vector<f32> in scenarios where scalars are acceptable today. Essentially, you are proposing banning scalars in many Ops.

While this ensures consistency, it comes with trade-offs. IMHO, we’ll always face a degree of complexity when mixing vector and scalar arguments, and I’m not convinced that this shift is worth the cost.

banach-space · December 13, 2024, 9:33am

Summary of the Discussion

There seems to be a growing consensus (*) to restrict the usage of 0-D vectors in the Vector dialect, rather than allowing them universally. This is exciting progress! However, there’s still disagreement on where and how to enforce these restrictions.

Below, I summarize the two main options under discussion, along with key design constraints for context.

Key Design Constraints

The Linalg Vectorizer, as the primary producer of Vector ops, must remain sound at all times, with no performance regressions.
Ingress and egress dialects must be well-supported:
- LLVM and SPIR-V (our main egress targets) seem unaffected.
- For ingress, we should confirm compatibility with other Vector users, such as Triton-CPU and ONNX-MLIR (I haven’t reached out to these communities yet).

(*) Note: Some contributors have argued for “0-D Vectors everywhere” or for “removing 0-D Vectors entirely.”

Option 1: Restrict Boundary Operations

(Proposed by Andrzej)

This proposal suggests limiting the handling of 0-D vectors to boundary operations like vector.extract, vector.insert, and vector.gather. Specifically:

Entering the Vector dialect: Convert tensor<f32> into f32 (instead of vector<f32>).
Exiting the Vector dialect: Convert f32 back into tensor<f32>.

Key benefits of this approach:

Within the Vector dialect, we could safely assume no 0-D vectors as arguments.
Scalar accumulators (e.g., in vector.contract or vector.multi_reduction) would require some special-casing, but this is already supported and manageable.
It minimizes complexity and avoids introducing new Ops or taxonomies.

From my perspective, this option ensures consistency while keeping the implementation simple. It aligns well with our goal of reducing complexity in the Vector dialect.

@dcaballe, does this align with what you envisioned?

Note: My original intention was a small experiment focused on restricting vector.extract and vector.insert. Here, I’ve expanded it for a more comprehensive “big picture” overview.

Option 2: Introduce a New Vector Dialect Taxonomy

(Proposed by Kunwar)

This proposal introduces a new taxonomy for handling scalars in the Vector dialect. It suggests:

Banning scalars in existing Ops, requiring vector<f32> where scalars are currently used.
Adding new Ops to handle scalar interactions explicitly, such as vector.extract_scalar and vector.insert_scalar.

Potential benefits:

Resolves ambiguities in handling scalar values.
May better support specific use cases, like scalar accumulators.

However, this approach comes with drawbacks:

Requires new Ops, which adds to the maintenance burden (more patterns, tests, and special-casing).
Creates new “categories” of Ops, which could be inconsistent in their handling of 0-D vectors.

My Perspective

Considering:

The broader context, including discussions like https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops, and
The practical implications for maintainability and performance,

I strongly favor Option 1.

Reducing the number of Ops in the Vector dialect offers clear advantages:

Fewer Ops → fewer patterns to write → fewer tests to maintain → lower overall maintenance cost.

That said, I recognize that parts of Option 2 are not entirely clear to me. If I’ve misunderstood anything, please feel free to correct me. We may also need to dive deeper into the finer details.

Next steps

I think that for our next steps it will be important to check what other ingress Dialects require from Vector.

I’m also curious to hear opinions about Option 1 and Option 2 above - I tried my best to present all the pros and cons based on the discussion. I appreciate that I am a bit biased here, but hopefully the overview will be helpful. Please let me know if I misinterpreted something and I will edit it.

Btw, I will be travelling next ~2 weeks and plan to take a break from this thread. I suspect others will be distracted as well.

Thank you all - great discussion so far!

-Andrzej

MaheshRavishankar · December 13, 2024, 6:12pm

Thats an interesting summary, cause I was just thinking the opposite. I dont think there is consensus on restricting 0D vectors. There are some operations that cannot support 0D vectors (like Kunwar highlighted), but it should be a default, i.e. most operation if defined well support 0D vectors, and specific operations do not support 0D vectors cause of their semantics. As has been mentioned earlier, it might be because vector dialect spans a couple of abstractions layers. Closer to Linalg-level 0D vectors are required for completeness in my book. I think it is wrong to Linalg as the “only reason” why 0D vectors come in. That is the main entry point today, but at the entry to vector dialect 0D vectors need to be supported to be complete.

stellaraccident · December 13, 2024, 7:40pm

@nicolasvasilache I feel like this is you couching a relatively cogent opinion in a non argumentative way. While there have been many words on this thread and a lot of good ideas, I feel like we’re approaching an impasse with two different viewpoints not quite aligning. Would you be willing to upgrade your analysis to a recommendation? I’d recommend doing so with a meeting if possible.

(Sounds like with the holidays, much of this is a topic for the new year anyway)

nicolasvasilache · December 17, 2024, 4:50pm

I am not completely sure how to parse this sentence, could you plz rephrase ?
Sure happy to discuss in a meeting some time in 2025, we could start from @Groverkss’ characterization.

stellaraccident · December 17, 2024, 6:44pm

Thanks, that’s really what I was asking. You engaged in the discussion but not deeply. Was hoping you might have the bandwidth to help folks agree on a path forward.

banach-space · January 2, 2025, 9:19am

Mindful of rfc-blackout-period, please don’t rush replying

Note, I am responding to 3 different threads below + provide a link to a patch implementing the experiment proposed here.

Ack, and thank you for clarifying. I now realize I misinterpreted the consensus, and your post helped clarify the discussion. Apologies for the confusion, and thank you for pointing this out!

To clarify:

“Restricting the usage of 0D vectors” != “disallowing 0D vectors entirely”. My proposal is about the former, not the latter.

I also considered Kunwar’s counter-proposal, which also supports restricting (not ubiquitously allowing) 0D vectors:

Given all the “likes”, I was under the impression that there’s support for “restricting” the usage of 0-D Vectors. What am I missing here?

Absolutely.

To understand the impact, I propose “sealing” two specific Ops as part of an experiment. This would allow us to collect data points and refine the approach based on concrete results. It’s a lightweight experiment that can be easily reverted if necessary.

Could you elaborate on why this is necessary? From my perspective, supporting rank-0 Tensors and MemRefs at the boundary should suffice, and rank-0 Vectors may not be needed within the dialect. Internally, we control the Vector dialect’s implementation details.

Agreed, and that’s why I’ve sought feedback from other projects:

Feedback from Triton
(from @ThomasRaoux, emphasis by me):

right now triton doesn’t support 0d tensors. The IR can have single element 1d tensors or scalars. For cases where we do a reduction of a 1d tensor, the result would be a scalar. I can’t think of any useful usage of 0d tensors at the moment.

If you have access to Triton Slack, there’s a very short thread in #dev.

Feedback from onnx-mlir
I raised this topic in onnx-mlir/issues/3029, but the feedback there is not comprehensive enough to draw firm conclusions.

if anything, the confusion between scalar, 0-D vectors, or 1D vector with 1 element is tiresome

To gain more insight, I reviewed their codebase:

There are a few references to “rank 0” tensors, but not many. See this search link.
For example, their test/mlir/onnx/invalid.mlir includes a case showing that 0-D tensors are not supported universally.

Additionally, I posted in their Slack channel within the Linux Foundation AI and Data Workspace. I haven’t received a response yet.

Other projects

There’s one more project that I am aware of that targets Vector directly:

TKW: A Wave Kernel Language for GPUs

It’s part of iree-turbine and IREE folks have already commented in this thread. Perhaps there’s someone specific we could reach out to?

Any other project that we should consider in this discussion?

Finally, here’s the actual experiment that I had in mind when posting this (sharing as an additional data point for the discussion):

[mlir][vector] Restrict vector.insert/vector.extract by banach-space · Pull Request #121458 · llvm/llvm-project · GitHub

As for next steps:

+1

Thank you for all your feedback so far
-Andrzej

stellaraccident · January 2, 2025, 6:40pm

There are a few people who could represent this, but @Groverkss is very involved and can likely speak to that group. I wouldn’t so much think about what tkw is doing as a constraint to how vector evolves – more that the folks working on that are long time vector dialect users in the more traditional linalg ingress flow, and they have developed a lot of experience in the process of applying that to tkw. Much of that is encapsulated in how Kunwar is representing the design points here.

banach-space · January 24, 2025, 3:37pm

I briefly sync-ed with @Groverkss re the next step:

Both Kunwar and I are a bit constrained in the coming weeks, so this will have to wait.

Thank you,