Design help for an array-based application with memrefs and affine maps

I’m developing an NumPy like application that uses MLIR to generate and optimize many of the kernels inside the library. I was hoping to utilize affine maps to implement some transformations on arrays (discussed later), but after reading the discussion on Semantics of affine_maps in memrefs? - #3 by bondhugula, I’m a bit less sure about a reasonable direction to go, and would appreciate some feedback / direction help.

Consider an implementation of binary addition on 2D arrays in numpy using MLIR. One might write some code that generates a loop like this:

  func.func @add(%arg0: memref<?x?xf64>, %arg1: memref<?x?xf64>, %arg2: memref<?x?xf64>) attributes {llvm.emit_c_interface} {
    %c0 = arith.constant 0 : index
    %dim = memref.dim %arg0, %c0 : memref<?x?xf64>
    %c1 = arith.constant 1 : index
    %dim_0 = memref.dim %arg0, %c1 : memref<?x?xf64>
    affine.for %arg3 = #map(%c0) to #map(%dim) {
      affine.for %arg4 = #map(%c0) to #map(%dim_0) {
        %0 = affine.load %arg0[%arg3, %arg4] : memref<?x?xf64>
        %1 = affine.load %arg1[%arg3, %arg4] : memref<?x?xf64>
        %2 = arith.addf %0, %1 : f64 %6, %arg2[%arg3, %arg4] : memref<?x?xf64>

To keep the code generation relatively simple, it is nice to assume that the two input arrays to the binary operation as well as the output array are all of the same shape. However, that often isn’t the case in numpy due to things like slicing and broadcasting. One could write something like

x = np.ones((n, n))
y = 1.0 * x

which is usually implemented by broadcasting a view of the 1.0 scalar as a 2-D array by promoting the scalar into an array, but only logically – no extra memory is allocated.

So there are two constraints I’m trying to grapple: 1) I want to make the implementations of the functions that generate these MLIR fragments simple – it’s nice if they don’t have to worry about transformations like I mentioned and 2) I want to eventually get those transformations reflected in the generated code, either at the point of generation, or some passes applied after.

My initial thinking was to make a pass over the generated MLIR fragments and construct an affine map for each of the input arrays based on the transformations that are applied to it. For example, if an array is broadcasted / promoted over a dimension, I would like to add an affine map that intuitively removes the promoted dimension. However, based on reading some documentation, I don’t think this sort of thing is possible – the affine maps don’t seem to amenable to this kind of dimension remapping as injectivity (for example) is no longer maintained.

Another option here is to write a pass that “undos” each transformation directly on the argument arrays and their uses inside the generated kernels, but I worry that this could be a little brittle, as this would have to be done after more specific dialects get lowered into memref, but then the assumptions they made about the operand memrefs might be wrong.

A final strategy is to instead consider these kinds of transformations when generating the task body, where we already don’t generate code using a 2-D memref (like in the above case), and manually undo the transformations when generating code that accesses each argument array. However, this approach feels like it greatly complicates the implementations of each numpy operator, as each operator needs to handle the potential dimensionalities of each input array.

Does anyone have an opinion here? Perhaps I’m misunderstanding how affine maps with memrefs could be used.

One very high-level design comment: if you already have a programming model or abstraction, it is usually better to define a dialect that matches that model as close as possible and then perform a conversion or lowering into other dialects rather than trying to “reinterpret” existing dialects so they match the abstraction. Specifically, consider defining a NumPy-specific tensor type. Given the popularity of NumPy, I suspect somebody already has one out-of-tree.

Lower-level design comment: affine maps and memrefs are completely separable. Memrefs do not require layout information to necessarily be expressed as an affine map, though they do require things like injectivity to simplify aliasing considerations. Affine maps themselves are just what their name indicates. So you can very well use affine maps to express broadcasts, offsets, etc. as long as this is done outside of the memref type. You can define your own type for that reason, or have affine maps in operations instead. Several upstream operations such as linalg.generic and vector.contract do the latter.

1 Like