Confused about -convert-parallel-loops-to-gpu

chenghuaWang · March 5, 2024, 1:18pm

Why -convert-parallel-loops-to-gpu can’t generate gpu.launch for the following MLIR code?

    scf.parallel (%arg1, %arg2) = (%c0_268, %c0_269) to (%c1_270, %c8_271) step (%c1_272, %c1_273) {
      ...
      scf.parallel (%arg3, %arg4) = (%c0_283, %c0_284) to (%72, %73) step (%c1_285, %c1_286) {
        ...
        scf.for %arg5 = %c0 to %c512 step %c16 {
         ...
          %84 = vector.create_mask %74, %75, %c16 : vector<16x16x16xi1>
          %85 = vector.mask %84 { vector.contract {indexing_maps = [#map11, #map2, #map12], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %79, %81, %83 : vector<16x16xf32>, vector<16x16xf32> into vector<16x16xf32> } : vector<16x16x16xi1> -> vector<16x16xf32>
          vector.transfer_write %85, %subview_289[%c0, %c0], %82 {in_bounds = [true, true]} : vector<16x16xf32>, memref<?x?xf32, strided<[1000, 1], offset: ?>>
        }
        scf.reduce 
      } {mapping = [#gpu.loop_dim_map<processor = thread_x, map = (d0) -> (d0), bound = (d0) -> (d0)>, #gpu.loop_dim_map<processor = thread_y, map = (d0) -> (d0), bound = (d0) -> (d0)>]}
      scf.reduce 
    } {mapping = [#gpu.loop_dim_map<processor = block_x, map = (d0) -> (d0), bound = (d0) -> (d0)>, #gpu.loop_dim_map<processor = block_y, map = (d0) -> (d0), bound = (d0) -> (d0)>]}

It seems that scf.parallel doesn’t get converted to gpu dialect when some functions are called in scf.parallel. What should I do?

scf.parallel ... {
    call @foo()
}

asiemien · March 6, 2024, 2:11pm

Hmm, I fail to reproduce either of the issues.

The first snippet, when slightly reduced for testing, outlines just fine. The same goes for an example with a function call present within mapped parallel loops.
Also, nothing obvious comes to my mind that could or should prevent outlining in here.

Which version of LLVM do you use?
If it is relatively up to date, could you share a more complete example?

Topic		Replies	Views
SCFToGPU convertion -convert-parallel-loops-to-gpu MLIR gpu	3	693	June 19, 2023
Low Parallelism in GPU Mapping for Nested Parallel Loops in MLIR MLIR gpu	3	119	February 20, 2025
Deprecate use of scf.for (previously loop.for) to gpu.launch conversion MLIR	7	792	May 28, 2020
How to lower scf.for to run on gpu with mlir-cude-runner MLIR	5	1032	December 4, 2020
Question: convert vector to mma ops MLIR	1	223	May 9, 2023

Confused about -convert-parallel-loops-to-gpu

Related topics