Different affine maps for the same affine.load when using a pass pipeline and a standalone pass

Hello there,

I have been working on a project for a few months mainly using the affine dialect. My project has a pipeline to perform loop transformation passes. Two days ago, I ran into an issue in one of the passes where the affine map of an affine.load seems incorrect. I tried to isolate the issue by printing the IR just before this buggy pass and ran this buggy pass on the printed IR. However, when I did that, the affine map seems correct. I have tried to add createCanonicalizerPass() and createGenerateRuntimeVerificationPass() before the buggy pass, but they did not help. I am still learning new things about MLIR, but I have never had this issue before, and I am wondering if anyone knows how to approach it. Here is an example of printing the affine.load op, its affine.for loops, and its affine map:
1- Printing from the buggy pass within the pass pipeline:

// printing from the buggy pass withing the pass pipeline
affine.for %arg4 = 0 to 16 {
  affine.for %arg5 = 0 to 32 {
    affine.for %arg6 = 0 to 32 {
      %2 = affine.load %alloc_3[%arg4, %arg5, %arg6] : memref<16x32x32xf32> //the AffineLoadOp
      affine.store %2, %alloc_5[%arg4, symbol(%arg5) + 1, symbol(%arg6) + 1] : memref<16x34x34xf32>
    }
  }
}
%2 = affine.load %alloc_3[%arg4, %arg5, %arg6] : memref<16x32x32xf32> // loadOp.dump();
(d0, d1, d2) -> (d2, d0, d1) // loadOp.getAffineMap().dump(); <= incorrect
(d0, d1, d2) -> (d2, d0, d1) // loadOp.getMap().dump(); <= incorrect

2- Printing from the buggy pass using the standalone command line tool:

// printing from the buggy pass using the standalone command line tool
affine.for %arg4 = 0 to 16 {
  affine.for %arg5 = 0 to 32 {
    affine.for %arg6 = 0 to 32 {
      %2 = affine.load %alloc_3[%arg4, %arg5, %arg6] : memref<16x32x32xf32> //the AffineLoadOp
      affine.store %2, %alloc_5[%arg4, symbol(%arg5) + 1, symbol(%arg6) + 1] : memref<16x34x34xf32>
    }
  }
}
%2 = affine.load %alloc_3[%arg4, %arg5, %arg6] : memref<16x32x32xf32> // loadOp.dump();
(d0, d1, d2) -> (d0, d1, d2) // loadOp.getAffineMap().dump(); <= correct
(d0, d1, d2) -> (d0, d1, d2) // loadOp.getMap().dump(); <= correct

I am using llvmorg-18.1.2 llvm-project @ 26a1d66

I am lowering from linalg to affine using createConvertLinalgToAffineLoopsPass(), and in the pipeline, I have a pass that performs loop permutations (using the function permuteLoops from Utils/LoopUtils.h).

I may be missing something, and any help would be appreciated.

Try using the generic form of the ops via the -mlir-print-op-generic flag. This will show the map that is used by the load without querying it explicitly.

Iā€™d suspect a memory error somewhere in your earlier passes. Try running under address sanitizer.

1 Like