1d vector permutation op using Linalg OpDSL

Hi!

I’m trying to define a permute op working on a 1d vector using Linalg OpDSL in Python; the vector permute op implements the following function for input vector length m*n:

Input vector I[m*n]
Output vector O[m*n]
Strides: m, n

for(int i = 0; i < m; i++)
    for(int j = 0; j < n; j++)
        O[i+m*j] = I[n*i+j]

The operator definition using OpDSL:

@linalg_structured_op
def vector_permute(
    K=TensorDef(T, S.KM, S.KN, index_dims=[D.km, D.kn]),
    I=TensorDef(T, S.M), 
    O=TensorDef(U, S.M, output=True),
):
    domain(D.km, D.kn)
    O[D.km + S.KM * D.kn] = TypeFn.cast_signed(U, I[D.kn + S.KN * D.km])

The error I got:

 File "/llvm-project/build/tools/mlir/python_packages/mlir_core/mlir/dialects/linalg/opdsl/lang/emitter.py", line 121, in prepare_common_structured_op
    raise ValueError(
ValueError: Expected indexing_maps to use no symbols after replacement and compression but got [AffineMap((d0, d1)[s0, s1] -> (d0, d1)), AffineMap((d0, d1)[s0, s1] -> (d1 + d0 * s1)), AffineMap((d0, d1)[s0, s1] -> (d0 + d1 * s0))]

Does anyone have experience with this? Any feedback is appreciated!

I tried to use the “strides” attribute instead of Shape-Only tensors; it can generate mlir, but the generated mlir cannot pass the verifier.

Op definition:

@linalg_structured_op
def dft_permute1(
    I=TensorDef(T, S.M), 
    O=TensorDef(T, S.M, output=True),
    strides=IndexAttrDef(S.SM, S.SN, default=[1, 1]),

):
    domain(D.sm, D.sn)
    O[D.sm + S.SM * D.sn] = I[D.sn + S.SN * D.sm]

Dumped mlir:

[AffineMap((d0, d1)[s0, s1, s2] -> (d1 + d0 * 8)), AffineMap((d0, d1)[s0, s1, s2] -> (d0 + d1 * 2))]
#map = affine_map<(d0, d1) -> (d1 + d0 * 8)>
#map1 = affine_map<(d0, d1) -> (d0 + d1 * 2)>
"builtin.module"() ({
  "func.func"() <{function_type = (memref<16xf32>, memref<16xf32>) -> (), sym_name = "dft_permute_w"}> ({
  ^bb0(%arg0: memref<16xf32>, %arg1: memref<16xf32>):
    "linalg.generic"(%arg0, %arg1) <{indexing_maps = [#map, #map1], iterator_types = [#linalg.iterator_type<parallel>, #linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 1, 1>}> ({
    ^bb0(%arg2: f32, %arg3: f32):
      "linalg.yield"(%arg2) : (f32) -> ()
    }) : (memref<16xf32>, memref<16xf32>) -> ()
    "func.return"() : () -> ()
  }) : () -> ()
}) : () -> ()

Error got when using mlir-opt:

1.mlir:6:5: error: 'linalg.generic' op expected the shape-to-loops map to be non-null
    "linalg.generic"(%arg0, %arg1) <{indexing_maps = [#map, #map1], iterator_types = [#linalg.iterator_type<parallel>, #linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 1, 1>}> ({
    ^
1.mlir:6:5: note: see current operation: 
"linalg.generic"(%arg0, %arg1) <{indexing_maps = [affine_map<(d0, d1) -> (d1 + d0 * 8)>, affine_map<(d0, d1) -> (d0 + d1 * 2)>], iterator_types = [#linalg.iterator_type<parallel>, #linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 1, 1>}> ({
^bb0(%arg2: f32, %arg3: f32):
  "linalg.yield"(%arg2) : (f32) -> ()
}) : (memref<16xf32>, memref<16xf32>) -> ()

You could use a 2D linalg.transpose and expand/collapse_shape ops around it to convert to 1-D.

If you want less structured linearized accesses you could use plain loops and load / store.

Hi @nicolasvasilache !

Thanks for your suggestion! I tried to use the 2D linalg.transpose and expand/collapse_shape ops, and it works!

I’m still confused about using Linalg OpDSL to implement the operation. Is it possible to implement the op in Linalg, or did I implement it incorrectly?

At this time, it seems the inference procedure that determines the ranges of d0, d1 from the expressions:

0 <= d1 + d0 * 8 < 16
0 <= d0 + d1 * 2 < 16

is not able to determine that 0 <= d0 < 2 and 0 <= d1 < 8.

This is because the current implementation does not use fancy projections or a solver but just identifies simple cases.

To handle more complex cases like the one you tried, we would want to solve for the system above.

Hi @nicolasvasilache ! Thanks again for your information!

I’m working on another operator, which encountered the same issue

OpDSL definition:

@linalg_structured_op
def AkI_x(
    K=TensorDef(T1, S.KN, index_dims=[D.kn]),
    A=TensorDef(T1, S.M, S.N),
    B=TensorDef(T1, S.N * S.KN),
    C=TensorDef(U, S.M * S.KN, output=True),
    cast=TypeFnAttrDef(default=TypeFn.cast_signed),
):

    domain(D.kn, D.m, D.n)
    implements(ContractionOpInterface)
    C[D.kn + D.m * S.KN] += cast(U, A[D.m, D.n]) * cast(U, B[D.kn + D.n * S.KN])

Error:

ValueError: Expected indexing_maps to use no symbols after replacement and compression but got [AffineMap((d0, d1, d2)[s0] -> (d0)), AffineMap((d0, d1, d2)[s0] -> (d1, d2)), AffineMap((d0, d1, d2)[s0] -> (d0 + d2 * s0)), AffineMap((d0, d1, d2)[s0] -> (d0 + d1 * s0))]

I’m considering your suggestion to use plain loops; if I understand it correctly, I need to implement the operator using affine/scf loops. However, I want to utilize the transformations in the Linalg dialect. I wonder if there is a way to generate linear generic ops for the above operations? Maybe not using OpDSL, using C++ to generate the Linalg op; I didn’t find such examples in the project.

I managed to make it work:

@linalg_structured_op
def AkI_x(
    K=TensorDef(T1, S.KN, index_dims=[D.kn]),
    A=TensorDef(T1, S.M, S.N),
    B=TensorDef(T1, S.N * S.KN),
    C=TensorDef(U, S.M * S.KN, output=True),
    strides=IndexAttrDef(S.SM, default=[1]),
    cast=TypeFnAttrDef(default=TypeFn.cast_signed),
):

    domain(D.kn, D.m, D.n)
    implements(ContractionOpInterface)
    C[D.kn + D.m * S.SM] += cast(U, A[D.m, D.n]) * cast(U, B[D.kn + D.n * S.SM])

It seems that for Shape-Only Tensors, I can only use them as iterators, not symbols in the affine map. And I can only use Index Attributes (strides) as symbols, not as iterators. So, I must pass the same parameter twice, such as the K and strides in the above example; they carry the same information.

This seems kind of redundant. I wonder if anyone has any ideas to implement it better?