[RFC][PSA?] Remove `vector.extractelement` and `vector.insertelement` ops in favor of `vector.extract` and `vector.insert` ops

dcaballe · June 5, 2023, 5:02am

Hi everyone,

TL;DR: I’m thinking about making a few cleanup changes to the Vector dialect and thought that removing the legacy vector.extractelement and vector.insertelement operations in favor of their bigger siblings vector.extract and vector.insert would be a good start. If we agree on this, this post will serve as an initial PSA of the change. Please let me know what you think! Collaboration on this work is more than welcome!

Background

Currently, we have two operations for extracting elements from a vector value: vector.extractelement and vector.extract. vector.extractelement was introduced in early stages of MLIR as a carbon copy of the LLVM extractelement instruction. This operation can only extract scalar elements from a 1-D or 0-D vector. Some time later, the Vector dialect evolved towards n-D vectors and vector.extractelement was introduced to model scalar and sub-vector extractions from an n-D vector. To some extent, vector.extract is a superset of vector.extractelement, although the former currently does not support extractions from 0-D vectors. The same applies to the vector.insertelement and vector.insert operations.

Proposal

I propose that we add support for 0-D vectors to vector.extract and vector.insert and remove vector.extractelement and vector.insertelement. The replacement of vector.extractelement/insertelement’s uses would be mechanical once vector.extract/insert support 0-D vectors.

This change will improve the Vector dialect by:

Removing ambiguity from the vector representation when it comes to extracting elements from a vector value.
Removing redundant patterns/transformations currently implemented for both vector.extract/insert and vector.extractelement/insertelement.
Filling implementation gaps in patterns/transformations where only one of the operation flavors is supported.

Action Plan

Introduce 0-D support in vector.extract/insert ops.
Fill any implementation gap in vector.extract/insert ops. Make sure that all the vector.extractelement/insertelement patterns and transformations are also supported for vector.extract/insert.
Remove vector.extractelement/insertelement from upstream except for the ops themselves. Send another PSA with the imminent removal of the ops. We can add warnings to the ops’ builders for downstream users, if necessary.
Remove vector.extractelement/insertelement operations.

Please let me know if you have any concerns or feedback! This is also a call for collaboration on this work. I’ll definitely need help!

Thanks,
Diego

cc: @nicolasvasilache, @aartbik, @ThomasRaoux, @banach-space, @javiersetoain

javiersetoain · June 5, 2023, 10:09am

Hi Diego,

Thank you for taking the initiative on this. I’m on board, the quest is righteous and the plan is sound You can count on me to help with the work. If you write the initial support for 0-D in insert/extract, we can make a list issues with whatever breaks when replacing insert/extractelement with insert/extract, and I’ll start chipping away at it. Feel free to add me as a reviewer for all related patches, and we can jump into a call to coordinate if necessary.

Cheers!
Javier

banach-space · June 5, 2023, 12:25pm

+1

This is a good example of “2 Ops would be better than 4” Thanks for proposing this Diego! I am also available to help. Either as a reviewer or more hands-on, whatever works better.

-Andrzej

dcaballe · June 11, 2023, 6:44am

Thanks for the support!

#1 is ready: ⚙ D152644 [mlir][Vector] Add support for 0-D vectors to vector.insert/extract

nicolasvasilache · June 11, 2023, 9:35pm

+1 thanks much for looking into reducing redundancy there!

dcaballe · July 13, 2023, 12:31am

The action item #1 should be completed with ⚙ D152644 [mlir][Vector] Add support for 0-D vectors to vector.insert/extract.
I’m now looking at #2 and, in particular, at adding support for Value indices to vector.extract/insert ops. They only support constant indices right now. The implementation is almost done but there are around 88 lit test files that would need to be updated to the new vector.extract/insert format. Unfortunately, I can’t think of a ninja regex way to update them so I would really appreciate some help here. Basically, we have to replace check rules like:

//  CHECK-NEXT:   %[[R2:.+]] =  vector.extract %{{.*}}[0, 0] : vector<1x1x16xf32>

with something like:

//   CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
//   CHECK-DAG:   %[[C0_1:.+]] = arith.constant 0 : index
...
//  CHECK-NEXT:   %[[R2:.+]] = vector.extract %{{.*}}[%[[C0]], %[[C0_1]]] : vector<1x1x16xf32>

where C0, C0_1, etc. can be random indices and be located anywhere in the output.

If anybody has cycles to work on this, please call out the tests you want to work on in the review. You can share your diffs over GH also there. I’ll make sure those contributing are added as co-authors. To repro, you can just apply the ongoing diff and run the check-mlir target.

Thanks in advance!
Diego

cc: @banach-space, @javiersetoain, @nicolasvasilache, @ThomasRaoux, @hanchung

krzysz00 · July 19, 2023, 4:03pm

Just to check, do vector.insert and vector.extract still only support constant indexes? We need variable indexes downstream.

I’ve now seen the patch.

dcaballe · July 19, 2023, 6:51pm

Yeah, you should be able to temporarily apply the ongoing patch but, please, note that the implementation and assembly format are subject to change.

Groverkss · October 27, 2024, 6:27pm

I noticed there hadn’t been an update in a while. So, I sent out a pr for Step #3 of the action plan: [mlir][Vector] Remove uses of vector.extractelement/vector.insertelement by Groverkss · Pull Request #113827 · llvm/llvm-project · GitHub

I also sent out a patch that updates vector.extractelement/vector.insertelement documentation to discourage usage of these ops until they are removed: [mlir][Vector] Depreciate vector.extractelement/vector.insertelement by Groverkss · Pull Request #113829 · llvm/llvm-project · GitHub

Thanks @dcaballe for all the previous work. This patch was mostly cleanup, with the hard parts already done by @dcaballe .

dcaballe · October 28, 2024, 7:00pm

Thanks a lot! The main roadblock I found after my PR was the gaps in canonicalizations/optimizations between the two ops. Not sure those gaps have been addressed already. Have you noticed any impact in performance?

Groverkss · November 14, 2024, 5:00pm

Hi! I did an analysis of the current folders/canonicalizations implemented for vector.extractelement/vector.insertelement and found this:

Canonicalizations

vector.extractelement/vector.insertelement do not have any canonicalizations.

Folders

vector.extractelement has folders for:

extractelement(splat) : vector.extract has the same support after: [mlir][Vector] Improve support for vector.extract(broadcast) by Groverkss · Pull Request #116234 · llvm/llvm-project · GitHub
extractelement(broadcast) : vector.extract has the same support after: [mlir][Vector] Improve support for vector.extract(broadcast) by Groverkss · Pull Request #116234 · llvm/llvm-project · GitHub
DenseElementsAttr folder : vector.extract has a canonicalization pattern that does this. This is not a folder for vector.extract because vector.extract can generate a vector constant (instead of a scalar constant) and needs to generate a new operation.

vector.insertelement has folders for:

DenseElementsAttr folder: vector.insert has a canonicalization pattern that does this. The reasoning for it not being a folder is same as vector.extract.

Conversion to LLVM

vector.extract lowering to LLVM generates llvm.extractelement, just like the lowering for vector.extractelement: llvm-project/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp at main · llvm/llvm-project · GitHub
vector.insert lowering to LLVM generates llvm.insertelement, just like the lowering for vector.insertelement : llvm-project/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp at main · llvm/llvm-project · GitHub

Conversion to SPIRV

vector.extract lowers to spirv::CompositeExtractOp for static cases and spirv::VectorExtractDynamicOp for dynamic cases. This is same as vector.extractelement: llvm-project/mlir/lib/Conversion/VectorToSPIRV/VectorToSPIRV.cpp at main · llvm/llvm-project · GitHub
vector.insert lowers to spirv::CompositeInsertOp for static cases and spirv::VectorInsertDynamicOp for dynamic cases. This is same as vector.insertelement : llvm-project/mlir/lib/Conversion/VectorToSPIRV/VectorToSPIRV.cpp at main · llvm/llvm-project · GitHub

Other transformations

The only pattern that runs on vector.extractelement/vector.insertelement that I could find was: llvm-project/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp at main · llvm/llvm-project · GitHub which seems to be implemented for both, vector.extract and vector.extractelement.

What I understand by looking at this, is that vector.extract/vector.insert have support whatever vector.extractelement/vector.insertelement supports + more folders + more transforms.

I’m not sure of what benchmarks to try for performance though, they should ideally generate the same or better code based on the support, but with my next patch to remove uses of vector.extractelement/vector.insertelement, I could see how it affects IREE cpu performance. If you have any particular benchmarks in mind, I’m happy to try them as well.

Let me know if i missed any of the canonicalizers / folders / transformations.

dcaballe · November 14, 2024, 5:07pm

Thanks for the analysis! Great to see that point 2 is converging. Yes, I think a quick run of IREE CI should be enough to move forward. Thanks for helping with this!

banach-space · November 14, 2024, 6:22pm

Thank you for this analysis!

Interesting to see the lack of symmetry between vector.extractelement and vector.insertelement (looking at folders). Looks like a gap waiting to be filled

Given that all of this is already in progress, we might have missed the opportunity to benchmark “before” and “after”.

Given that this is already in progress, we might have missed our opportunity to track the perf impact Still, could you try a deeplabv3 and a bert benchmark? Not too concerned which variant you choose.

Thanks for this great clean-up work!