PSA: You need to run `expand-strided-metadata` before `memref-to-llvm` now


TL;DR The lowering of the memref dialect to the llvm dialect now requires the run an additional pass, expand-strided-metadata, before memref-to-llvm.


The lowering of some complicated memref operations (e.g., memref.subview) required a non-trivial amount of code. That code needed to be re-implemented for every target dialect, leading to code duplications and unnecessary work.

To fix this problem, we moved the core logic necessary to lower these complex memref operations in a dedicated pass. This pass materializes the effect of the lowering with affine and arith constructs that can then be lowered to the target dialect (e.g., llvm when doing memref-to-llvm). (For the curious, the refactoring started with ⚙ D133166 [mlir][MemRef] Canonicalize extract_strided_metadata(subview))

What does it mean for you?

Concretely, to be NFC with respect to what memref-to-llvm was doing, you need to run expand-strided-metadata, -lower-affine, -convert-arith-to-llvm on top of -convert-memref-to-llvm now.

See mlir/test/Conversion/MemRefToLLVM/expand-then-convert-to-llvm.mlir for a practical example.

CC: @ftynse, @nicolasvasilache


Thank you Quentin: disaggregating and better layering of the memref to llvm lowerings is a very nice cleanup and greatly appreciated.


Can the relevant populate*Patterns be directly used in memref-to-llvm so that one doesn’t need to run extra passes prior to memref-to-llvm? The whole thing is still memref-to-llvm. It doesn’t look right to me to have to know to run strided-metadata, -lower-affine, -convert-arith-to-llvm for memref-to-llvm – either the latter should be renamed or better the entire combo and necessary pattern should be part of memref-to-llvm. Why doesn’t separate populate... methods solve this problem? The methods could be reused by the other dialects’ conversions.


This is a great question and actually the direction I initially took when I worked on the refactoring.
The complete conversation around why we ended up here can be found at ⚙ D136377 [mlir][MemRefToLLVM] Remove the code for lowering subview.

The TL;DR is when we do memref-to-llvm we don’t want to lower non memref operations as it could be confusing to the client of this API.

We could create a named pipeline to make the dependencies easier to follow, but that would have the side effect of also lowering affine and arith.


1 Like

memref-to-llvm is a pass. The populate*Patterns methods is the API, and the API would already allow one to load/use the different pieces (populate methods). I don’t see why there’d be confusion. I think the whole premise that the memref-to-llvm pass should strictly not go through non-memref dialect operations is weak - it is untenable as well. One would often need to lower through a few ops from other dialects. The problem you set out to solve is a key one, and the modular populate methods do solve it. There’s really nothing the breaking of the pass itself is solving and it’s only making things confusing and complex.

I think the confusion would be that one would expect that populateMemRefToLLVMConversionPatterns would behave exactly the same as memref-to-llvm, whereas that wouldn’t be true if the pass would have all the other populate methods in it.

I hear you. I think ideally we would like to lower all byproducts of the memref lowering to LLVM. The problem is, at least right now, we have no way to distinguish what has been added by the current legalization, and thus what we’d like to lower all the way to LLVM, and what was there already, and should be left untouched.

1 Like

But this is just a naming issue - can be solved by a rename or the doc comment on it. Either that or populateMemRefToLLVM can be a composition of several populate methods and behave the same way as memref-to-llvm. Taking the memref dialect tollvm is a common use case and a user would expect everything needed in there. Custom passes could be constructed downstream with the right mix of populate methods, which I think answers the other question you have below IIUC.

This would always be an issue and you’d have to put together all the patterns or passes together if memref-to-llvm or some combination of upstream passes doesn’t serve your purpose – I don’t see why this would be a lot of code/complexity beyond just mixing the right patterns. The C++ API is what’s needed.

The problem is not with the difference between a path and an API method. It is with the fact that the pass must not be performing irrelevant IR modifications. One wouldn’t expect LICM to inline functions. One wouldn’t expect CSE to add aliasing info attributes to operations. Similarly, one shouldn’t have to deal with memref-to-llvm converting preexisting affine operations to the set-of-dialects-formerly-known-as-standard, which is what it would do if it subsumed the other conversions.

The real problem with such chained conversions isn’t solved by the populate* methods. It can be solved by a mechanism that applies additional conversions only on the operations affected by the previous patterns. In this case, it would convert the affine.apply produced by expansion, but preserve any preexisting operation. We do not have infrastructural support for that.

Now, the situation with such “preparation” pass is not optimal. We can and should improve the documentation around it, and provide some helpful error message of the kind “did you forget to run X before?” if feasible. However, it is not the first case where this specific lowering requires a preparation pass, we have had memref normalization as a precondition for lowering memrefs with non-trivial layouts since very early days. It is also least worst in my opinion because we can document and detect it in one, expect place – the pass definition/description.

I do understand that there is a recurrent use case with lowering the entirety of upstream dialects to the LLVM dialect when building trivial flows. Therefore I suggested during code review to add a pass pipeline doing exactly that. Non-trivial flows likely have their own dialects and a more complex lowering scheme between those dialects and upstream dialects, so they should be able to build their own lowering pipelines as needed.


Have the GPU to ROCDL conversions been updated to account for this?

If tests exist that exercise the behavior then yes :slight_smile:

None of the integration tests use complex memref strides, and I don’t think the buildbot is main

I had but didn’t commit it, since all the tests were still passing without it.

The change is relatively simple.

In your case, it should look like:

diff --git a/mlir/lib/Conversion/GPUToROCDL/CMakeLists.txt b/mlir/lib/Conversion/GPUToROCDL/CMakeLists.txt
index c9fc198487e2..667bfead9a22 100644
--- a/mlir/lib/Conversion/GPUToROCDL/CMakeLists.txt
+++ b/mlir/lib/Conversion/GPUToROCDL/CMakeLists.txt
@@ -10,6 +10,7 @@ add_mlir_conversion_library(MLIRGPUToROCDLTransforms
+  MLIRAffineToStandard
@@ -18,6 +19,7 @@ add_mlir_conversion_library(MLIRGPUToROCDLTransforms
+  MLIRMemRefTransforms
diff --git a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
index 1f8159017ee9..eaa0ec6938ca 100644
--- a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+++ b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
@@ -15,6 +15,7 @@
 #include "mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h"
 #include "mlir/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.h"
+#include "mlir/Conversion/AffineToStandard/AffineToStandard.h"
 #include "mlir/Conversion/ArithToLLVM/ArithToLLVM.h"
 #include "mlir/Conversion/FuncToLLVM/ConvertFuncToLLVM.h"
 #include "mlir/Conversion/LLVMCommon/ConversionTarget.h"
@@ -28,6 +29,7 @@
 #include "mlir/Dialect/GPU/Transforms/Passes.h"
 #include "mlir/Dialect/LLVMIR/ROCDLDialect.h"
 #include "mlir/Dialect/Math/IR/Math.h"
+#include "mlir/Dialect/MemRef/Transforms/Passes.h"
 #include "mlir/Dialect/Vector/IR/VectorOps.h"
 #include "mlir/Pass/Pass.h"
 #include "mlir/Transforms/DialectConversion.h"
@@ -133,6 +135,8 @@ struct LowerGpuOpsToROCDLOpsPass
     populateVectorToLLVMConversionPatterns(converter, llvmPatterns);
     cf::populateControlFlowToLLVMConversionPatterns(converter, llvmPatterns);
     populateFuncToLLVMConversionPatterns(converter, llvmPatterns);
+    memref::populateExpandStridedMetadataPatterns(llvmPatterns);
+    populateAffineToStdConversionPatterns(llvmPatterns);
     populateMemRefToLLVMConversionPatterns(converter, llvmPatterns);
     populateGpuToROCDLConversionPatterns(converter, llvmPatterns, runtime);
     LLVMConversionTarget target(getContext());

Come to think of it, we probably don’t actually want to slide those changes into the conversion pass. The point of GPU to ROCDL is to gather up all the *-to-llvm patterns into one place so that we can make sure LLVM-related options are applied consistently and so that we know we’re only running on GPU modules.

What about renaming the memref-to-llvm pass to be more explicit? For example finalize-memref-to-llvm, late-memref-to-llvm, stride-expanded-memref-to-llvm ?

We could then rename the populate pattern function accordingly to match the pass name, removing any ambiguity. The name would be clear to introduce a pipeline named memref-to-llvm that could do it all.


Sounds sensible to me.

What do other people think?

how convenient ⚙ D138776 [mlir][Test] Add a test pass to act as a sink towards LLVM conversion :slight_smile:

And looks like we should update the pipeline of the test pass to call createExpandStridedMetadata. I missed it when I “enabled” expand-strided-metadata because I was looking for populateMemRefToLLVM (as opposed to createMemRefToLLVM) and no tests were breaking :slight_smile:.

1 Like

Fixed the pipeline of the test-lower-to-llvm pass at ⚙ D139840 [mlir][NFC] Make test-lower-to-llvm a named pipeline and ⚙ D139841 [mlir][test] Add expand-strided-metadata to lower-to-llvm. (Pending reviews.)

Next I’ll rename memref-to-llvm into finalize-memref-to-llvm.

1 Like

Finally got around doing that: ⚙ D142463 [mlir][Conversion] Rename the MemRefToLLVM pass