I started looking into this in the context of loop.parallel
as @ftynse mentioned. I see the current code using sequential loops as early experimentation and not something that would be part of the final pipeline. The mapping @ftynse implemented is incorrect in the presence of loop carried dependencies, which the code does not check for.
https://reviews.llvm.org/D73893 has some code that shows my current thinking (which is still evolving). Here is an example form the tests
func @parallel_loop(%arg0 : index, %arg1 : index, %arg2 : index,
%arg3 : index, %arg4 : index,
%buf : memref<?x?xf32>,
%res : memref<?x?xf32>) {
%step = constant 2 : index
loop.parallel (%i0, %i1) = (%arg0, %arg1) to (%arg2, %arg3)
step (%arg4, %step) {
%val = load %buf[%i0, %i1] : memref<?x?xf32>
store %val, %res[%i1, %i0] : memref<?x?xf32>
} { mapping = [
{processor = 1, map = affine_map<(d0) -> (d0)>, bound = affine_map<(d0) -> (d0)>},
{processor = 0, map = affine_map<(d0) -> (d0)>, bound = affine_map<(d0) -> (d0)>}
] }
return
}
The mapping
attribute specifies for each dimension of the loop.parallel
which hardware id corresponds. The processor
specifies blockid (0,1,2) or threadid (3,4,5) or sequential (6). The map
specifies how the hardware id is transformed into the index variable of the loop. The bound
specifies how the size of the hardware id is computed based on the maximum number of iterations the loop has.
This is all very preliminary and especially the operands to the maps will need to evolve further, as one probably wants to get lower/upper bounds to do the computations. Instead of designing something complex to start with, I am currently exploring some examples.
I also take the view of @ftynse that we should try to implement this as multiple transformation steps. However, these transformations are not independent but will have to be orchestrated by a plan of what code should ultimately be emitted. For example loop-tiling decisions will depend on what final mapping we want to achieve. I don’t think we necessarily need to express the full lowering strategy in attributes.