init mlir
module {
func.func @elementwise(%arg0: memref<32x1280x768xf32>, %arg1: memref<32x1280x768xf32>, %arg2: memref<32x1280x768xf32>) {
affine.for %arg3 = 0 to 32 {
affine.for %arg4 = 0 to 1280 {
affine.for %arg5 = 0 to 768 {
%0 = affine.load %arg0[%arg3, %arg4, %arg5] : memref<32x1280x768xf32>
%1 = affine.load %arg1[%arg3, %arg4, %arg5] : memref<32x1280x768xf32>
%2 = arith.mulf %0, %1 : f32
affine.store %2, %arg2[%arg3, %arg4, %arg5] : memref<32x1280x768xf32>
}
}
}
return
}
}
RUN:
mlir-opt %s -affine-super-vectorize="virtual-vector-size=128 test-fastest-varying=0" -affine-parallelize='max-nested=1'
result:
module {
func.func @elementwise(%arg0: memref<32x1280x768xf32>, %arg1: memref<32x1280x768xf32>, %arg2: memref<32x1280x768xf32>) {
affine.for %arg3 = 0 to 32 {
affine.for %arg4 = 0 to 1280 {
affine.for %arg5 = 0 to 768 step 128 {
%cst = arith.constant 0.000000e+00 : f32
%0 = vector.transfer_read %arg0[%arg3, %arg4, %arg5], %cst : memref<32x1280x768xf32>, vector<128xf32>
%cst_0 = arith.constant 0.000000e+00 : f32
%1 = vector.transfer_read %arg1[%arg3, %arg4, %arg5], %cst_0 : memref<32x1280x768xf32>, vector<128xf32>
%2 = arith.mulf %0, %1 : vector<128xf32>
vector.transfer_write %2, %arg2[%arg3, %arg4, %arg5] : vector<128xf32>, memref<32x1280x768xf32>
}
}
}
return
}
}
From the perspective of operators, the outermost affine.for can be parallelized.
However, in the file mlir/lib/Dialect/Affine/Analysis/AffineAnalysis.cpp, the functions isLoopMemoryParallel and checkMemrefAccessDependence can only analyze dependencies from affine.store and affine.copy, and cannot handle vector.transfer_write and vector.transfer_read.
Will this scenario be supported in the future?
Thank u.