MLIR: Failed to lower affine.for to gpu dialect

Hello, I’m a beginner in mlir and Toy Dialect. After following the tutorial, I’m now trying to convert some operations to GPU dialect based on affine dialect. I’m using the pass SCF to GPU , and some problems occur when I’m trying to convert the basic add between two tensor to GPU dialect.
The affine dialect of add is like:

affine.for %arg0 = 0 to 2 {
      affine.for %arg1 = 0 to 3 {
        %4 = affine.load %3[%arg0, %arg1] : memref<2x3xf64>
        %5 = affine.load %3[%arg0, %arg1] : memref<2x3xf64>
        %6 = addf %4, %5 : f64
        affine.store %6, %2[%arg0, %arg1] : memref<2x3xf64>
      }
    }

and after adding the SCFToGPU pass, it throws out an error:

error: 'affine.load' op index must be a dimension or symbol identifier

This should be a type check in affine dialect, however there’s no error while emitting affine dialect and error occurs in SCFToGPU pass.
A temporary solution is to use LoadOP instead of AffineLoadOP while lowering Toy to affine, but considering that I may want to add some more complex operations using AffineMap in the future, this couldn’t solve the problem perfectly.

What should I do to avoid such error and where’s the verify function of affine.load called in SCFToGPU pass?

Thanks!

First, the Toy dialect is what its name says it is, a toy. Please don’t consider it for anything other than understanding the basic concepts of defining new dialects in MLIR. It was not (and arguably should not be) designed for anything other than illustrating the tutorial chapters.

It is.

This means that the affine dialect as emitted is correct, but trying to apply SCFToGPU pass makes it incorrect. (Incidentally, there is no SCFToGPU pass, I assume you meant AffineForToGPU.) The verifier runs after every pass and complains about any errors in the new state of the IR.

This would break any chance of Affine dialect being able to reason about this code. There is no point, at least currently, in emitting affine "for"s and non-affine memory accesses.

I suppose you’d need to modify the AffineForToGPU pass so that it also converts affine.loads whose operands are no longer classified as dimensions or symbols (in particular, loop iterators that got replaced with thread/block ids) into “standard” loads, while keeping the rest of the loads intact.

It is not called in the pass. It wouldn’t be reasonable to call verification functions on every possible IR object in every possible pass. Instead, the pass manager calls the module/function-level verifier after every pass. The top-level verifier then dispatches to individual verification functions as necessary. I don’t think we want to change this behavior.

Thanks for you reply!

Does your words mean that a verification happens during the conversion, while some operands of affine.load got replaced with thread/block ids and the operation is still affine.load, and these ids are found illegal for affine.load in type check?

So is it the point that current AffineForToGPU pass doesn’t support the conversion of affine.loads? Or it’s not a common error and it just occurs because the affine code generate by Toy dialect has some problem with some further conversion?

It is not happening during the conversion but after, passes are allowed to create invalid IR as long as they correct it later, and it’s not exactly a type check, but the general reasoning is correct.

Looks like it.