Hi all,
This is a proposal to add a new operation, vector.to_elements, to the Vector dialect.
This operation is the symmetric counterpart to the existing vector.from_elements op.
An initial prototype can be found in this PR: [mlir][Vector] Add vector.to_elements op
Proposal
The vector.to_elements operation leverages MLIR’s multi-result op feature to decompose
an input vector into all its scalar elements. The decomposed scalar elements are
returned in row-major order. For example:
// Decompose a 1-D vector of 2 elements.
%0:2 = vector.to_elements %v1 : vector<2xf32>
// %0#0 = %v1[0]
// %0#1 = v1[1]
// Decompose a 2-D vector.
%0:6 = vector.to_elements %v2 : vector<2x3xf32>
// returns 6 scalar values in flattened row-major order.
// %0#0 = %v2[0, 0]
// ...
// %0#5 = %v2[1, 2]
The operation inherently encodes the extraction order of the scalar elements without
explicit indices, providing a powerful abstraction to encode complex vector-wide
extraction and insertion transformations.
Motivation
vector.to_elements brings two major benefits when processing large sequences of
vector.extract (and vector.insert) operations:
1. Code Size Reduction
Currently, extracting all or many elements from a vector requires large sequences of
vector.extract operations. Consider a vector<1024xf32>:
- Today: 1,024 separate
vector.extractoperations - With
vector.to_elements: 1 operation
Using vector.to_elements represents a major reduction in code size for these scenarios.
2. Simplified Pattern Recognition and Optimization
vector.to_elements simplifies vector transformation analysis involving large sequences
of vector extraction and insertion operations. For instance, LLVM’s InstCombine spends
significant time analyzing chains of extractelement/insertelement instructions to
identify shuffle patterns. This requires:
- Grouping large sequences of
extractelementinstructions by source vector - Matching large sequences of
extractelementinstructions to large sequences ofinsertelementinstructions - Analyzing and sorting extraction and insertion indices
The structured and implicit order of extraction in vector.to_elements eliminates
this complexity. For example:
- Folding redundant sequences of extractions and insertions can be as simple as:
llvm::equals(toElements.getResults(), fromElements.getOperands()) - Transforming large sequences of extractions and insertions into a shuffle (simple case) can be as simple as checking that all
vector.from_elementsoperands originate from the samevector.to_elements.
Discussion
Relationship with vector.extract
vector.to_elements and vector.extract have overlapping functionality. A long-term goal could be to extend vector.to_elements to cover all use cases of vector.extract, which would allow us to propose the deprecation of vector.extract in favor of its more structured and powerful counterpart. However, we believe it is best to first introduce this operation and gain practical experience before pursuing that path.
Handling Unused Results
vector.to_elements may generate a large number of result values that have no uses. In terms of memory consumption, we do not think this is an issue compared to vector.extract operations, especially when a considerable number of elements are being extracted. Identifying and ignoring dead results during the transformation or lowering of vector.to_elements is trivial, as it only requires checking that corresponding result value has uses.
Next Steps
We seek support to add vector.to_elements to the Vector dialect. The immediate plan is to implement more advanced vector.extract/vector.insert canonicalization/transformation patterns that this new operation will enable, which are not present in MLIR today. After validating the approach and gaining more experience, we plan to continue working towards the deprecation of vector.extract in favor of vector.to_elements, if the community things this is the right way to go.
Your feedback is appreciated!
Thanks!