I am trying to vectorize this code :
#map = affine_map<(d0, d1, d2, d3) -> (d1)>
#map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
#map2 = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
#map3 = affine_map<(d0) -> (d0 floordiv 49)>
#map4 = affine_map<(d0, d1) -> ((d0 floordiv 112) * 2 + (d1 mod 49) floordiv 7)>
#map5 = affine_map<(d0, d1) -> (d0 * 2 + d1 - (d0 floordiv 112) * 224 - (d1 floordiv 7) * 7)>
#map6 = affine_map<(d0, d1, d2, d3) -> (d1, d3)>
#map7 = affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>
#map8 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>
...
%0 = bufferization.alloc_tensor() : tensor<32x3x230x230xf32>
%1 = bufferization.alloc_tensor() : tensor<64x3x7x7xf32>
%2 = bufferization.alloc_tensor() : tensor<64xf32>
%11 = tensor.empty() : tensor<32x64x112x112xf32>
%12 = linalg.generic {producerTag1,indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%2 : tensor<64xf32>) outs(%11 : tensor<32x64x112x112xf32>) {
^bb0(%in: f32, %out: f32):
linalg.yield %in : f32
} -> tensor<32x64x112x112xf32>
%collapsed = tensor.collapse_shape %1 [[0], [1, 2, 3]] : tensor<64x3x7x7xf32> into tensor<64x147xf32>
%collapsed_1 = tensor.collapse_shape %12 [[0], [1], [2, 3]] : tensor<32x64x112x112xf32> into tensor<32x64x12544xf32>
%13 = tensor.empty() : tensor<32x147x12544xf32>
%14 = linalg.generic {producerTag, indexing_maps = [#map2], iterator_types = ["parallel", "parallel", "parallel"]} outs(%13 : tensor<32x147x12544xf32>) {
^bb0(%out: f32):
%50 = linalg.index 0 : index
%51 = linalg.index 1 : index
%52 = linalg.index 2 : index
%53 = affine.apply #map3(%51)
%54 = affine.apply #map4(%52, %51)
%55 = affine.apply #map5(%52, %51)
%extracted = tensor.extract %0[%50, %53, %54, %55] : tensor<32x3x230x230xf32>
linalg.yield %extracted : f32
} -> tensor<32x147x12544xf32>
When trying to vectorize the operation %53 using transform.structured.vectorize
(after tiling it tile size [4, 7, 4]), I am getting error: Attempted to vectorize, but failed
.
I am trying to understand why the vectorization is failing in this case, does this have anything to do with the tiling sizes I am using ? And if there is a way to actually vectorize it or it’s just not possible due to the memory accesses ?
I would appreciate any assistance in resolving this issue.