[MLIR][Vectorize] Divsion with reminder

MLS · June 17, 2024, 1:15pm

I want to know what happens when divide vector dimension size by vectorization factor has a remainder.

llvm/llvm-project/blob/0432221c8e6a8e5740a982076a6ae85e5ee9909e/mlir/include/mlir/Dialect/Affine/Utils.h#L93-L104


      
          struct VectorizationStrategy {
            // Vectorization factors to apply to each target vector dimension.
            // Each factor will be applied to a different loop.
            SmallVector<int64_t, 8> vectorSizes;
            // Maps each AffineForOp vectorization candidate with its vector dimension.
            // The candidate will be vectorized using the vectorization factor in
            // 'vectorSizes' for that dimension.
            DenseMap<Operation *, unsigned> loopToVectorDim;
            // Maps loops that implement vectorizable reductions to the corresponding
            // reduction descriptors.
            ReductionLoopMap reductionLoops;
          };

such as:

  affine.for %arg2 = 0 to 64 step 5 {
      affine.for %arg3 = 0 to 64 step 4 {
        %cst = arith.constant 0.000000e+00 : f32
        %0 = vector.transfer_read %arg0[%arg2, %arg3], %cst : memref<64x64xf32>, vector<5x4xf32>
        %cst_0 = arith.constant 0.000000e+00 : f32
        %1 = vector.transfer_read %arg1[%arg2, %arg3], %cst_0 : memref<64x64xf32>, vector<5x4xf32>
        %2 = arith.addf %0, %1 : vector<5x4xf32>
        vector.transfer_write %2, %alloc[%arg2, %arg3] : vector<5x4xf32>, memref<64x64xf32>
      }
    }

dasdibye · June 18, 2024, 2:52pm

If there is reminder after division by VF you should see masked vectorization in effect.

MLS · June 19, 2024, 2:15am

But I didn’t find any implementation of masked vectorization in Affine’s SuperVectorize.

dasdibye · June 19, 2024, 10:23am

some masking support is there as can be seen in example provided in the supervectorize.cpp file but maybe doesnt cover all cases ?

#map = affine_map<(d0) → (-d0 + 500)>
func @vecred(%arg0: memref<512xf32>) → f32 {
%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant dense<0.000000e+00> : vector<128xf32>
%0 = affine.for %arg1 = 0 to 500 step 128 iter_args(%arg2 = %cst_0)
→ (vector<128xf32>) {
// %2 is the number of iterations left in the original loop.
%2 = affine.apply #map(%arg1)
%3 = vector.create_mask %2 : vector<128xi1>
%cst_1 = arith.constant 0.000000e+00 : f32
%4 = vector.transfer_read %arg0[%arg1], %cst_1 :
memref<512xf32>, vector<128xf32>
%5 = math.cos %4 : vector<128xf32>
%6 = arith.addf %arg2, %5 : vector<128xf32>
// We filter out the effect of last 12 elements using the mask.
%7 = select %3, %6, %arg2 : vector<128xi1>, vector<128xf32>
affine.yield %7 : vector<128xf32>
}
%1 = vector.reduction , %0 : vector<128xf32> into f32
return %1 : f32
}

MLS · June 20, 2024, 6:50am

Got it, thank you.And the vectorizing reductions is supported only for 1-D vectors.

Topic		Replies	Views
Understanding Vectorization Failure with transform.structured.vectorize MLIR mlir	4	82	June 25, 2024
Steps towards generalizing vectorization in Affine MLIR	14	2154	November 11, 2020
Parallelization of affine.for containing vector.transfer_write/read is not supported MLIR	4	216	June 8, 2023
Parallelization of affine.for containing vector.transfer_write/read is not supported MLIR	0	154	June 8, 2023
Vector.transfer_read padding and division operations MLIR	1	242	December 2, 2022

[MLIR][Vectorize] Divsion with reminder

Related Topics