How to Unroll vector.mask %0 { vector.contract ...}

chenghuaWang · February 2, 2024, 5:05am

I’m vectorizing a linalg.matmul op using vector dialect. The linalg.matmul op is already tiled. And when doing vectorization, my vectorize pass doing things below:

Vectorization using linalg::vectorize
Unrolling
Casting away vector leading one dim
Hoisting
Lowering to LLVM IR

Everything works fine if mat mul size can be divisible by tile size. But if it cannot be divisible, vectorization will introduce vector.mask op.

%12 = vector.mask %11 { vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %8, %9, %10 : vector<4x4xf32>, vector<4x4xf32> into vector<4x4xf32> } : vector<4x4x4xi1> -> vector<4x4xf32>

vector.mask %11 { vector.contract } seems can not be unrolled using mlir::vector::populateVectorUnrollPatterns(...). Does anyone know how to solve this problem?

dcaballe · February 2, 2024, 6:00am

Hi there!

We usually apply vector unrolling relatively late in the pipeline when vector.mask has been folded away so unrolling vector.mask is not implemented in mlir::vector::populateVectorUnrollPatterns. However, we also apply unrolling as part of lowering vector.contract to vector.outerproduct or vector.fma so you can try to do that with mlir::vector::populateVectorContractLoweringPatterns. vector.mask is supported there.

You can also try to apply peeling with mlir::linalg::peelLoops before running linalg::vectorize. That should generate a main loop, which would vectorize without masks, and a remainder loop. FTR, I was not very successful applying masking to vector.contract in the past. Whereas it’s supported in MLIR, I struggled to represent masked scalar loads in LLVM, in a way that the LLVM backends would generate efficient asm for it.

Happy to answer any other questions that you may have.
Diego

chenghuaWang · February 2, 2024, 12:35pm

Thanks!

Using mlir::vector::populateVectorContractLoweringPatterns directly with the outerproduct option gives the same result as unroll method in my pipeline. mlir::linalg::peelLoops might cause some performance issues for matrices that are tiled multiple times(tile size=[[8, 32, 0], [4, 4, 0], [0, 0, 4]])? But I’m not sure if introducing arith.select will cause performance loss, I’ll test it later.

In the beginning, I used a pass pipeline referenced from lei.caht()'s practice in IREE. But it seems that mlir::vector::populateVectorContractLoweringPatterns can now directly replace the unroll step mentioned in the article?

dcaballe · February 2, 2024, 6:59pm

It depends on the target but using a select to blend the new result with the pass-through one is a well-known pattern and should be peepholed into masked instruction if available in the target.

Things are moving quickly! That’s probably a question for @antiagainst

Topic		Replies	Views
[PSA] Scalable auto-vec in Linalg without masking MLIR	9	546	June 25, 2024
Linalg and masking MLIR	12	892	June 11, 2022
MLIR for arm SME vectorizing matmul-like ops as part of a broader program MLIR	2	254	April 26, 2024
[RFC] Vector Masking Representation in MLIR MLIR	12	1491	September 26, 2022
Vector.create_mask for scalable vectors MLIR	10	795	February 21, 2022

How to Unroll vector.mask %0 { vector.contract ...}

Related topics