Understanding Parallel loops to GPU lowering

ftynse · March 16, 2020, 3:50pm

I have been considering that we should do loop transformations as such and keep the mapping relatively simple. These transformations can still be driven by the same set of annotations, but become easier to test and reuse. For example, if you want to map multiple loops to the same block/thread id, you can coalesce those loops into a single loop and map just it. Same for tiling with dynamic values, we can do it as a transformation and map the outer loop and keep the inner in the kernel, potentially canonicalizing it away if it is known to have a single iteration statically. We already have a mapLoopToProcessorIDs for non-parallel loops that does exactly that.

Topic		Replies	Views
Low Parallelism in GPU Mapping for Nested Parallel Loops in MLIR MLIR gpu	3	113	February 20, 2025
Constructing pipeline lowering an affine parallel loop to NVIDIA GPU MLIR gpu	4	446	June 6, 2023
Deprecate use of scf.for (previously loop.for) to gpu.launch conversion MLIR	7	790	May 28, 2020
How to lowering gpu.launch correctly MLIR	4	249	December 4, 2023
Confused about -convert-parallel-loops-to-gpu MLIR	1	174	March 6, 2024

Understanding Parallel loops to GPU lowering

Related topics