This is a continuation of the discussion in: https://github.com/llvm/llvm-project/pull/72105. One outcome of the discussion is noticing that the semantics of a number of vector ops converge when manipulating unit dimensions ( vector.shape_cast , vector.broadcast , vector.extract/insert , vector.extract_element / insert_element and vector.transpose) and cleaning them up relies on “getting lucky” with a slew of canonicalizations/patterns. This leads to difficulties in picking canonical forms and special casing in vector related patterns.
Idea #1
One approach could be to introduce vector.expand_shape/vector.collapse_shape with similar semantics to tensor.expand/collapse_shape. The observation here is that outside of vector.extract_element/insert_element, all of the above ops devolve to reshapes when the only affected dimensions are unit dimensions. This could unify a few of the representations above without having to sacrifice all “loop structure” like shape cast does.
vector.broadcast ... : vector<4xf32> to vector<1x4xf32>
// ==
vector.expand_shape [[0, 1]] ... : vector<4xf32> to vector<1x4xf32>
vector.extract %0[0, 0] : vector<4xf32> from vector<1x1x4xf32>
// ==
vector.collapse_shape [[0, 1, 2]] ... : vector<1x1x4xf32> to vector<4xf32>
vector.transpose [0, 1] ... : vector<1x4xf32> to vector<4x1xf32>
// ==
%1 = vector.collapse_shape [[0, 1]] ... : vector<1x4xf32> to vector<4xf32>
vector.expand_shape [[0, 1]] ... : vector<4xf32> to vector<4x1xf32>
Concerns with this representation could be the use of two ops to represent the single transpose; it doesn’t really look more canonical than before to me, bringing me to the second idea.
Idea #2
Add collapse and expand indices to vector.shape_cast. The observation here is that any non-scalable vector reshape can be represented with a full shape collapse + shape expand.
vector.shape_cast ... : vector<AxBxf32> to vector<BxAxf32>
// ==
%1 = vector.collapse_shape [[0, 1]] ... : vector<AxBxf32> to vector<(A*B)xf32>
vector.expand_shape [[0, 1]] ... : vector<(A*B)xf32> to vector<BxAxf32>
Default shape_cast semantics can keep the full collapse → full expand and elide the reshape indices, making this a relatively non-intrusive change. The benefit of explicitly representing the collapse/expand indices is that analysis of whether a shape cast is just a collapse or expand becomes significantly easier, and also introduces a relatively trivial pattern that rewrites any shape cast as an explicit collapse/expand. Then we can do analysis on the expand/collapse portions in isolation (or even try propagating them through the IR in different directions!).
I’m not familiar enough with scalable vector semantics to know that this is completely correct, but arbitrary shape casts don’t seem to play very nice with scalable vectors, e.g.
shape_cast ... : vector<[1]x[1]x[1]xf32> to vector<[1]x[1]xf32>
There is ambiguity to me in how the shapes are being reassociated in this case (maybe this is just illegal), but having explicit collapse/expand indices seems to clarify the semantics here (my high level understanding of scalable vectors is that they are similar to dynamic shapes in tensors but with more restrictions).
shape_cast collapse = [[0, 1], [2]] ... : vector<[1]x[1]x[1]xf32> to vector<[1]x[1]xf32>
Additionally, if we made this addition, vectorization of static tensor.collapse_shape and tensor.expand_shape could be able to preserve the reshape structure of those ops through vectorization without having to go straight to shape_cast (if such a pattern even makes sense).
Thoughts?
The above are just a couple quick ideas I’m coming up with in the wake of the discussion on the PR. There is definitely more to flesh out, and we need to be sure that this is really able to handle unit vector dims the way we want. More/other ideas are welcome.
@mehdi_amini @dcaballe @MaheshRavishankar @antiagainst @nicolasvasilache @banach-space @c-rhodes