TL;DR: I’m thinking about making a few cleanup changes to the Vector dialect and thought that removing the legacy vector.extractelement and vector.insertelement operations in favor of their bigger siblings vector.extract and vector.insert would be a good start. If we agree on this, this post will serve as an initial PSA of the change. Please let me know what you think! Collaboration on this work is more than welcome!
Background
Currently, we have two operations for extracting elements from a vector value: vector.extractelement and vector.extract. vector.extractelement was introduced in early stages of MLIR as a carbon copy of the LLVM extractelement instruction. This operation can only extract scalar elements from a 1-D or 0-D vector. Some time later, the Vector dialect evolved towards n-D vectors and vector.extractelement was introduced to model scalar and sub-vector extractions from an n-D vector. To some extent, vector.extract is a superset of vector.extractelement, although the former currently does not support extractions from 0-D vectors. The same applies to the vector.insertelement and vector.insert operations.
Proposal
I propose that we add support for 0-D vectors to vector.extract and vector.insert and remove vector.extractelement and vector.insertelement. The replacement of vector.extractelement/insertelement’s uses would be mechanical once vector.extract/insert support 0-D vectors.
This change will improve the Vector dialect by:
Removing ambiguity from the vector representation when it comes to extracting elements from a vector value.
Removing redundant patterns/transformations currently implemented for both vector.extract/insert and vector.extractelement/insertelement.
Filling implementation gaps in patterns/transformations where only one of the operation flavors is supported.
Action Plan
Introduce 0-D support in vector.extract/insert ops.
Fill any implementation gap in vector.extract/insert ops. Make sure that all the vector.extractelement/insertelement patterns and transformations are also supported for vector.extract/insert.
Remove vector.extractelement/insertelement from upstream except for the ops themselves. Send another PSA with the imminent removal of the ops. We can add warnings to the ops’ builders for downstream users, if necessary.
Thank you for taking the initiative on this. I’m on board, the quest is righteous and the plan is sound You can count on me to help with the work. If you write the initial support for 0-D in insert/extract, we can make a list issues with whatever breaks when replacing insert/extractelement with insert/extract, and I’ll start chipping away at it. Feel free to add me as a reviewer for all related patches, and we can jump into a call to coordinate if necessary.
This is a good example of “2 Ops would be better than 4” Thanks for proposing this Diego! I am also available to help. Either as a reviewer or more hands-on, whatever works better.
The action item #1 should be completed with ⚙ D152644 [mlir][Vector] Add support for 0-D vectors to vector.insert/extract.
I’m now looking at #2 and, in particular, at adding support for Value indices to vector.extract/insert ops. They only support constant indices right now. The implementation is almost done but there are around 88 lit test files that would need to be updated to the new vector.extract/insert format. Unfortunately, I can’t think of a ninja regex way to update them so I would really appreciate some help here. Basically, we have to replace check rules like:
where C0, C0_1, etc. can be random indices and be located anywhere in the output.
If anybody has cycles to work on this, please call out the tests you want to work on in the review. You can share your diffs over GH also there. I’ll make sure those contributing are added as co-authors. To repro, you can just apply the ongoing diff and run the check-mlir target.
Thanks a lot! The main roadblock I found after my PR was the gaps in canonicalizations/optimizations between the two ops. Not sure those gaps have been addressed already. Have you noticed any impact in performance?
DenseElementsAttr folder : vector.extract has a canonicalization pattern that does this. This is not a folder for vector.extract because vector.extract can generate a vector constant (instead of a scalar constant) and needs to generate a new operation.
vector.insertelement has folders for:
DenseElementsAttr folder: vector.insert has a canonicalization pattern that does this. The reasoning for it not being a folder is same as vector.extract.
What I understand by looking at this, is that vector.extract/vector.insert have support whatever vector.extractelement/vector.insertelement supports + more folders + more transforms.
I’m not sure of what benchmarks to try for performance though, they should ideally generate the same or better code based on the support, but with my next patch to remove uses of vector.extractelement/vector.insertelement, I could see how it affects IREE cpu performance. If you have any particular benchmarks in mind, I’m happy to try them as well.
Let me know if i missed any of the canonicalizers / folders / transformations.
Thanks for the analysis! Great to see that point 2 is converging. Yes, I think a quick run of IREE CI should be enough to move forward. Thanks for helping with this!
Interesting to see the lack of symmetry between vector.extractelement and vector.insertelement (looking at folders). Looks like a gap waiting to be filled
Given that all of this is already in progress, we might have missed the opportunity to benchmark “before” and “after”.
Given that this is already in progress, we might have missed our opportunity to track the perf impact Still, could you try a deeplabv3 and a bert benchmark? Not too concerned which variant you choose.
As a overall comment, it would be nice if there were a mechanism for putting up deprecation on ops so that downstreams that don’t read here as often or will need time to transition will have that period
Downstream can always just revert the commit that remove the ops in their downstream fork as needed. In general when using LLVM/MLIR downstream, it’s hard to track HEAD without such mechanism that allows you to decouple your transition from the update of LLVM/MLIR (we don’t always can or want to go through soft-deprecation anyway: it can take time and increase friction in getting things improved upstream).
I discussed a similar idea with @rengolin and @qed recently – I’d love to have a tablegen mechanism to add let deprecated = "Reason"; to op definitions and have those result in C++ deprecation warnings and warnings emitted when we verify IR with deprecated ops, that could optionally be turned into errors. This is something that would make op deprecation more mechanical from the POV of downstream projects that can catch deprecations in CI instead of having to stay on top of discourse discussions.
For me, the main difficulty is when deprecated ops appear in mlir bytecode that needs to be upgraded, and we don’t have a good upgrade tools at the moment.