[RFC][PSA?] Remove `vector.extractelement` and `vector.insertelement` ops in favor of `vector.extract` and `vector.insert` ops

Hi everyone,

TL;DR: I’m thinking about making a few cleanup changes to the Vector dialect and thought that removing the legacy vector.extractelement and vector.insertelement operations in favor of their bigger siblings vector.extract and vector.insert would be a good start. If we agree on this, this post will serve as an initial PSA of the change. Please let me know what you think! Collaboration on this work is more than welcome!

Background

Currently, we have two operations for extracting elements from a vector value: vector.extractelement and vector.extract. vector.extractelement was introduced in early stages of MLIR as a carbon copy of the LLVM extractelement instruction. This operation can only extract scalar elements from a 1-D or 0-D vector. Some time later, the Vector dialect evolved towards n-D vectors and vector.extractelement was introduced to model scalar and sub-vector extractions from an n-D vector. To some extent, vector.extract is a superset of vector.extractelement, although the former currently does not support extractions from 0-D vectors. The same applies to the vector.insertelement and vector.insert operations.

Proposal

I propose that we add support for 0-D vectors to vector.extract and vector.insert and remove vector.extractelement and vector.insertelement. The replacement of vector.extractelement/insertelement’s uses would be mechanical once vector.extract/insert support 0-D vectors.

This change will improve the Vector dialect by:

  • Removing ambiguity from the vector representation when it comes to extracting elements from a vector value.
  • Removing redundant patterns/transformations currently implemented for both vector.extract/insert and vector.extractelement/insertelement.
  • Filling implementation gaps in patterns/transformations where only one of the operation flavors is supported.

Action Plan

  1. Introduce 0-D support in vector.extract/insert ops.
  2. Fill any implementation gap in vector.extract/insert ops. Make sure that all the vector.extractelement/insertelement patterns and transformations are also supported for vector.extract/insert.
  3. Remove vector.extractelement/insertelement from upstream except for the ops themselves. Send another PSA with the imminent removal of the ops. We can add warnings to the ops’ builders for downstream users, if necessary.
  4. Remove vector.extractelement/insertelement operations.

Please let me know if you have any concerns or feedback! This is also a call for collaboration on this work. I’ll definitely need help!

Thanks,
Diego

cc: @nicolasvasilache, @aartbik, @ThomasRaoux, @banach-space, @javiersetoain

4 Likes

Hi Diego,

Thank you for taking the initiative on this. I’m on board, the quest is righteous and the plan is sound :wink: You can count on me to help with the work. If you write the initial support for 0-D in insert/extract, we can make a list issues with whatever breaks when replacing insert/extractelement with insert/extract, and I’ll start chipping away at it. Feel free to add me as a reviewer for all related patches, and we can jump into a call to coordinate if necessary.

Cheers!
Javier

2 Likes

+1

This is a good example of “2 Ops would be better than 4” :slight_smile: Thanks for proposing this Diego! I am also available to help. Either as a reviewer or more hands-on, whatever works better.

-Andrzej

1 Like

Thanks for the support!

#1 is ready: ⚙ D152644 [mlir][Vector] Add support for 0-D vectors to vector.insert/extract

+1 thanks much for looking into reducing redundancy there!

The action item #1 should be completed with ⚙ D152644 [mlir][Vector] Add support for 0-D vectors to vector.insert/extract.
I’m now looking at #2 and, in particular, at adding support for Value indices to vector.extract/insert ops. They only support constant indices right now. The implementation is almost done but there are around 88 lit test files that would need to be updated to the new vector.extract/insert format. Unfortunately, I can’t think of a ninja regex way to update them so I would really appreciate some help here. Basically, we have to replace check rules like:

//  CHECK-NEXT:   %[[R2:.+]] =  vector.extract %{{.*}}[0, 0] : vector<1x1x16xf32>

with something like:

//   CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
//   CHECK-DAG:   %[[C0_1:.+]] = arith.constant 0 : index
...
//  CHECK-NEXT:   %[[R2:.+]] = vector.extract %{{.*}}[%[[C0]], %[[C0_1]]] : vector<1x1x16xf32>

where C0, C0_1, etc. can be random indices and be located anywhere in the output.

If anybody has cycles to work on this, please call out the tests you want to work on in the review. You can share your diffs over GH also there. I’ll make sure those contributing are added as co-authors. To repro, you can just apply the ongoing diff and run the check-mlir target.

Thanks in advance!
Diego

cc: @banach-space, @javiersetoain, @nicolasvasilache, @ThomasRaoux, @hanchung

3 Likes

Just to check, do vector.insert and vector.extract still only support constant indexes? We need variable indexes downstream.

I’ve now seen the patch.

Yeah, you should be able to temporarily apply the ongoing patch but, please, note that the implementation and assembly format are subject to change.

I noticed there hadn’t been an update in a while. So, I sent out a pr for Step #3 of the action plan: [mlir][Vector] Remove uses of vector.extractelement/vector.insertelement by Groverkss · Pull Request #113827 · llvm/llvm-project · GitHub

I also sent out a patch that updates vector.extractelement/vector.insertelement documentation to discourage usage of these ops until they are removed: [mlir][Vector] Depreciate vector.extractelement/vector.insertelement by Groverkss · Pull Request #113829 · llvm/llvm-project · GitHub

Thanks @dcaballe for all the previous work. This patch was mostly cleanup, with the hard parts already done by @dcaballe .

Thanks a lot! The main roadblock I found after my PR was the gaps in canonicalizations/optimizations between the two ops. Not sure those gaps have been addressed already. Have you noticed any impact in performance?

Hi! I did an analysis of the current folders/canonicalizations implemented for vector.extractelement/vector.insertelement and found this:

Canonicalizations

vector.extractelement/vector.insertelement do not have any canonicalizations.

Folders

vector.extractelement has folders for:

vector.insertelement has folders for:

  • DenseElementsAttr folder: vector.insert has a canonicalization pattern that does this. The reasoning for it not being a folder is same as vector.extract.

Conversion to LLVM

Conversion to SPIRV

Other transformations

The only pattern that runs on vector.extractelement/vector.insertelement that I could find was: llvm-project/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp at main · llvm/llvm-project · GitHub which seems to be implemented for both, vector.extract and vector.extractelement.

What I understand by looking at this, is that vector.extract/vector.insert have support whatever vector.extractelement/vector.insertelement supports + more folders + more transforms.

I’m not sure of what benchmarks to try for performance though, they should ideally generate the same or better code based on the support, but with my next patch to remove uses of vector.extractelement/vector.insertelement, I could see how it affects IREE cpu performance. If you have any particular benchmarks in mind, I’m happy to try them as well.

Let me know if i missed any of the canonicalizers / folders / transformations.

1 Like

Thanks for the analysis! Great to see that point 2 is converging. Yes, I think a quick run of IREE CI should be enough to move forward. Thanks for helping with this!

1 Like

Thank you for this analysis!

Interesting to see the lack of symmetry between vector.extractelement and vector.insertelement (looking at folders). Looks like a gap waiting to be filled :sweat_smile:

Given that all of this is already in progress, we might have missed the opportunity to benchmark “before” and “after”.

Given that this is already in progress, we might have missed our opportunity to track the perf impact :frowning: Still, could you try a deeplabv3 and a bert benchmark? Not too concerned which variant you choose.

Thanks for this great clean-up work!

-Andrzej