Hi all. I was trying to convert parallel reduce to gpu. After construct affine IR with reduction add as, run the command mlir-opt -convert-parallel-loops-to-gpu affine_reduction.mlir
. And there will do nothing after the pass optimization.
#map = affine_map<(d0) -> (d0)>
module {
func @affine_parallel_with_reductions(%arg0: memref<3x3xf32>) -> f32{
%c0 = arith.constant 0 : index
%c2 = arith.constant 2 : index
%c1 = arith.constant 1 : index
%cst = arith.constant 0.000000e+00 : f32
%0 = scf.parallel (%arg1, %arg2) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) init (%cst) -> f32 {
%1 = memref.load %arg0[%arg1, %arg2] : memref<3x3xf32>
scf.reduce(%1) : f32 {
^bb0(%arg3: f32, %arg4: f32): // no predecessors
%2 = arith.addf %arg3, %arg4 : f32
scf.reduce.return %2 : f32
}
scf.yield
} {mapping = [{bound = #map, map = #map, processor = 0}, {bound = #map, map = #map, processor = 1}]}
return %0 : f32
}
}
After reading the implementation of llvm-project/mlir/lib/Conversion/SCFToGPU.cpp
, currently scf to gpu not support reduction.
static LogicalResult processParallelLoop(
ParallelOp parallelOp, gpu::LaunchOp launchOp,
BlockAndValueMapping &cloningMap, SmallVectorImpl<Operation *> &worklist,
DenseMap<gpu::Processor, Value> &bounds, PatternRewriter &rewriter) {
// TODO: Verify that this is a valid GPU mapping.
// processor ids: 0-2 block [x/y/z], 3-5 -> thread [x/y/z], 6-> sequential
ArrayAttr mapping =
parallelOp->getAttrOfType<ArrayAttr>(gpu::getMappingAttrName());
// TODO: Support reductions.
if (!mapping || parallelOp.getNumResults() != 0)
return failure();
When I just open this option for close reduction, I get some error as below. I suppose that the reason of this error is that the SSA value of scf.parallel. Because in the current UT case in MLIR, the scf.parallel will not return values.
And I have two question here.
The first one is that when the community will support this feature, support reduction when convertion scf to gpu.
The second one is that if the I want to implement a basic version there are any potential problem or point that I should focus on.
Thank u.
modify code
// TODO: Support reductions.
if (!mapping )
return failure();
errors
parallel.mlir:39:10: error: failed to legalize operation 'scf.parallel' marked as erased
%0 = scf.parallel (%arg1, %arg2) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) init (%cst) -> f32 {
^
current parallel ut
%step = arith.constant 2 : index
scf.parallel (%i0, %i1) = (%arg0, %arg1) to (%arg2, %arg3)
step (%arg4, %step) {
%val = memref.load %buf[%i0, %i1] : memref<?x?xf32>
memref.store %val, %res[%i1, %i0] : memref<?x?xf32>
} { mapping = [{processor = 1, map = affine_map<(d0) -> (d0)>, bound = affine_map<(d0) -> (d0)>}, {processor = 0, map = affine_map<(d0) -> (d0)>, bound = affine_map<(d0) -> (d0)>}] }
return