Memref slice extraction

After different attempts I can’t get the Linalg’s SliceOp to work properly. The task is simple: I’ve to extract a slice from a multi-dimensional array (e.g. I have memref<5x3xi32> and I want to create a view over only one of the rows, let’s say the second one, which should thus result in a memref<3xi32>).

mlir::Value memref = builder.create<AllocaOp>(location, MemRefType::get({5, 3}, builder.getI32Type()));

mlir::Value c0 = builder.create<ConstantOp>(location, builder.getIndexAttr(0));
mlir::Value c3 = builder.create<ConstantOp>(location, builder.getIndexAttr(3));
mlir::Value c1 = builder.create<ConstantOp>(location, builder.getIndexAttr(1));
mlir::Value range = builder.create<linalg::RangeOp>(location, c0, c3, c1);

SmallVector<mlir::Value, 3> indexes;
indexes.push_back(c1);
indexes.push_back(range);

mlir::Value slice = builder.create<linalg::SliceOp>(location, memref, indexes);

The code generates the following IR:

%0 = alloca() : memref<5x3xi32>
%c0 = constant 0 : index
%c3 = constant 0 : index
%c1 = constant 0 : index
%3 = linalg.range %c0_7 : %c3 : %c1_8 : !linalg.range
%4 = linalg.slice %0[%c1, %3]  : memref<5x3xi32>, index, !linalg.range, memref<?x?xi32, affine_map<(d0, d1) -> (d0 * 3 + d1)>>

I can’t understand the point of the error:

'linalg.slice' op expected rank of the view(2) to be the number of ranges(1)

Indeed the number of the ranges is lower than the rank of the original view, given that I want to reduce its rank. This should be legit also accorging also to the operation examples.

The only way I found to avoid the error is to manually set the result type of the operation, but I don’t think it should be necessary at all:

mlir::Value slice = builder.create<linalg::SliceOp>(location, MemRefType::get({ 3 }, builder.getI32Type()), memref, indexes);

A quick update:
I tried also the SubView operation, which may fit my needs even better , but I’m encountering a strange behaviour which seems simlar to the above one.

I define a 5x3xi32 memref and populate it this way:

0  | 1  | 2
3  | 4  | 5
6  | 7  | 8
9  | 10 | 11
12 | 13 | 14

Then I extract the 3rd row using the SubView operation.
If I don’t manually set the result type and access it as a 2-D memref, %3 is correctly set to 7 (anyway this is not what I want to achieve, because my purpose is to access the slice as if it was independently, and thus without the 2nd dimension)

%1 = alloca() : memref<5x3xi32>
... stores in %1 ...
%2 = subview %2[2, 0] [1, 3] [1, 1] : memref<5x3xi32> to memref<1x3xi32, affine_map<(d0, d1) -> (d0 * 3 + d1 + 6)>>
%3 = load %2[%c0, %c1] : memref<1x3xi32, affine_map<(d0, d1) -> (d0 * 3 + d1 + 6)>>
// %3 is set to 7

Instead, if I set the result type (which I would like to do, in order to reduce the rank), %3 becomes equal to 1, as if it was descarding the offsets.

%1 = alloca() : memref<5x3xi32>
... stores in %1 ...
%2 = subview %2[2, 0] [1, 3] [1, 1] : memref<5x3xi32> to memref<3xi32>
%3 = load %2[%c1] : memref<3xi32>
// %3 is set to 1

Can someone please explain how to properly use the SubView operation to reduce the get a reduced-rank view of a memref? I’m obviously missing something but I can’t see what.

SliceOp should be removed soon, we had an internal use that prevented us to do so.

Looks like a bug, thanks for reporting, I’ll investigate.

This test seems to pass for me, can you confirm it passes for you?

// RUN: mlir-opt %s -convert-std-to-llvm | \
// RUN: mlir-cpu-runner -e main -entry-point-result=void \
// RUN:   -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext

global_memref "private" constant @__constant_5x3xf32 : memref<5x3xf32> =
dense<[[0.0, 1.0, 2.0],
       [3.0, 4.0, 5.0],
       [6.0, 7.0, 8.0],
       [9.0, 10.0, 11.0],
       [12.0, 13.0, 14.0]]>

func @main() {
  %0 = get_global_memref @__constant_5x3xf32 : memref<5x3xf32>
  %1 = subview %0[2, 0][1, 3][1, 1]: memref<5x3xf32> to memref<3xf32>

  %unranked = memref_cast %1 : memref<3xf32> to memref<*xf32>
  call @print_memref_f32(%unranked) : (memref<*xf32>) -> ()

  //      CHECK: Unranked Memref base@ = {{0x[-9a-f]*}}
  // CHECK-SAME: rank = 1 offset = 6 sizes = [3] strides = [1] data =
  // CHECK-NEXT: [6,  7,  8]

  return
}

func private @print_memref_f32(%ptr : memref<*xf32>)

so the offset is not discarded in my test, as you can see in the printout.

I imagine the code you are posting is not what you are trying to run, right?
Otherwise your alloca is just uninitialized.

Can you print the whole memref with the descriptor info similary to what I did with print_memref_f3 ?

Seems to work also for me, I get the following by running the first-line command.

Unranked Memref base@ = 0x7f08b91d1000 rank = 1 offset = 6 sizes = [3] strides = [1] data = [6, 7, 8]

Here it is:

Unranked Memref base@ = 0x7ffff1710b10 rank = 2 offset = 0 sizes = [5, 3] strides = [3, 1] data =
[[0,   1,   2],
 [3,   4,   5],
 [6,   7,   8],
 [9,   10,   11],
 [12,   13,   14]]

As you stated, the example I wrote above semmed to have the memref uninitialized, but in reality I just trimmed out that part for brevity.

I also tried to load an element from the view, which is the critical point, but I can’t link the print_f32 function, which from other posts around the forum seems to exist (by the way, is there a list of the support functions?)

global_memref "private" constant @__constant_5x3xf32 : memref<5x3xf32> =
dense<[[0.0, 1.0, 2.0],
       [3.0, 4.0, 5.0],
       [6.0, 7.0, 8.0],
       [9.0, 10.0, 11.0],
       [12.0, 13.0, 14.0]]>

func @main() {
  %0 = get_global_memref @__constant_5x3xf32 : memref<5x3xf32>
  %1 = subview %0[2, 0][1, 3][1, 1]: memref<5x3xf32> to memref<3xf32>

  %unranked = memref_cast %1 : memref<3xf32> to memref<*xf32>
  call @print_memref_f32(%unranked) : (memref<*xf32>) -> ()

  %c0 = constant 0 : index
  %val = load %1[%c0] : memref<3xf32>
  call @print_f32(%val) : (f32) -> ()

  return
}

func private @print_memref_f32(%ptr : memref<*xf32>)
func private @print_f32(%arg0 : f32)
JIT session error: Symbols not found: [ print_f32 ]

The libmlir_runner_utils.so is built from this file: llvm-project/RunnerUtils.cpp at master · llvm/llvm-project · GitHub

The functions that are exported with “extern C” become available at JIT time after linking (i.e. passing -shared_libs=.../xxx.so to mlir-cpu-runner). You can define your own helper functions, link them and use them. There is no print_f32 available but you can write your own.

Note that vector.print %0: f32 could work out of the boc as a the corner case of a 0-D vector.

Could you also print_memref_f32 the 1-D subview result in your original code that fails? I would like to see whether the offset is properly propagated.

If all else fails, once you get the JIT stuff running, can you post a standalone .mlir with the RUN command like I did earlier?

Thanks!

Thanks to your for the help!

Unranked Memref base@ = 0x7fffd7706e10 rank = 2 offset = 0 sizes = [5, 3] strides = [3, 1] data =
[[0,   1,   2],
 [3,   4,   5],
 [6,   7,   8],
 [9,   10,   11],
 [12,   13,   14]]
Unranked Memref base@ = 0x7fffd7706e10 rank = 1 offset = 6 sizes = [3] strides = [1] data =
[6,  7,  8]

It’s not perfectly clear to me the syntax to be used for the test and I just put the run command at the beginning as you did. This is the .mlir code and by running it you can see the result is 1 instead of 7.

// RUN: mlir-opt %s -convert-std-to-llvm | \
// RUN: mlir-cpu-runner -e main -entry-point-result=i32 \
// RUN:   -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext
func @main() -> i32 {
  %0 = alloca() : memref<5x3xi32>
  %c0 = constant 0 : index
  %c1 = constant 1 : index
  %c2 = constant 2 : index
  %c3 = constant 3 : index
  %c4 = constant 4 : index

  %c0_i32 = constant 0 : i32
  %c1_i32 = constant 1 : i32
  %c2_i32 = constant 2 : i32
  %c3_i32 = constant 3 : i32
  %c4_i32 = constant 4 : i32
  %c5_i32 = constant 5 : i32
  %c6_i32 = constant 6 : i32
  %c7_i32 = constant 7 : i32
  %c8_i32 = constant 8 : i32
  %c9_i32 = constant 9 : i32
  %c10_i32 = constant 10 : i32
  %c11_i32 = constant 11 : i32
  %c12_i32 = constant 12 : i32
  %c13_i32 = constant 13 : i32
  %c14_i32 = constant 14 : i32

  store %c0_i32, %0[%c0, %c0] : memref<5x3xi32>
  store %c1_i32, %0[%c0, %c1] : memref<5x3xi32>
  store %c2_i32, %0[%c0, %c2] : memref<5x3xi32>
  store %c3_i32, %0[%c1, %c0] : memref<5x3xi32>
  store %c4_i32, %0[%c1, %c1] : memref<5x3xi32>
  store %c5_i32, %0[%c1, %c2] : memref<5x3xi32>
  store %c6_i32, %0[%c2, %c0] : memref<5x3xi32>
  store %c7_i32, %0[%c2, %c1] : memref<5x3xi32>
  store %c8_i32, %0[%c2, %c2] : memref<5x3xi32>
  store %c9_i32, %0[%c3, %c0] : memref<5x3xi32>
  store %c10_i32, %0[%c3, %c1] : memref<5x3xi32>
  store %c11_i32, %0[%c3, %c2] : memref<5x3xi32>
  store %c12_i32, %0[%c4, %c0] : memref<5x3xi32>
  store %c13_i32, %0[%c4, %c1] : memref<5x3xi32>
  store %c14_i32, %0[%c4, %c2] : memref<5x3xi32>

  %unranked_original = memref_cast %0 : memref<5x3xi32> to memref<*xi32>
  call @print_memref_i32(%unranked_original) : (memref<*xi32>) -> ()

  %subview = subview %0[2, 0] [1, 3] [1, 1] : memref<5x3xi32> to memref<3xi32>

  %unranked_subview = memref_cast %subview : memref<3xi32> to memref<*xi32>
  call @print_memref_i32(%unranked_subview) : (memref<*xi32>) -> ()

  %element = load %subview[%c1] : memref<3xi32>

  return %element : i32
}

func private @print_memref_i32(%ptr : memref<*xi32>)

Output (basically the above one plus the result value):

Unranked Memref base@ = 0x7fffd7706e10 rank = 2 offset = 0 sizes = [5, 3] strides = [3, 1] data =
[[0,   1,   2],
 [3,   4,   5],
 [6,   7,   8],
 [9,   10,   11],
 [12,   13,   14]]
Unranked Memref base@ = 0x7fffd7706e10 rank = 1 offset = 6 sizes = [3] strides = [1] data =
[6,  7,  8]
1

Thanks @mscuttari I am able to repro now.
The memref prints properly, this seems to be an issue with the loadOp.

  %element = load %subview[%c0] : memref<3xi32>

lowers to

    %145 = llvm.extractvalue %135[1] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
    %146 = llvm.getelementptr %145[%0] : (!llvm.ptr<i32>, i64) -> !llvm.ptr<i32>
    %147 = llvm.load %146 : !llvm.ptr<i32>

and seems to ignore the offset.

I’ll investigate on Monday, it’s getting late in EU now.

Thanks!

1 Like

Just a quick update: I’ve done few more tests and the behaviour seems broken also for the Store operation, in the same way as the Load one (the offset is ignored). Subsequently, also operations relying on load / store, such as Linalg’s copy, lead to wrong results when operating on subviews.

FYI the issue seems to be with the verifier of the subview op in the rank-reducing case. The result type should have offset 6 and stride 1 but the verifier wrongly skips the check if the affine map is empty.
Will send a fix Monday but in the meantime your example should work with memref<3xf32, offset:6, strides:[1]>

That works. But how can I set a dynamic offset?
My ultimate goal is to achieve something like:

subview %memref[%offset1, 0][1, 3][1, 1] : memref<5x3xi32> to memref<3xi32, offset: ??, strides: [1]>

Almost … :slight_smile: try a single ? not two

Wow, I didn’t even think it could be that simple :joy:
A little side question then: what about the C++ Memref builders? How can I tell them to get a dynamic offset? All I found is I can give it an affine map, but something like the following (which is just a test case, nothing meaningful), makes it crash at runtime (more in detail, in the isRankReducedType method of the SubView’s op verifier):

auto expr = rewriter.getAffineSymbolExpr(0);
auto map = AffineMap::get(1, 1, expr);
auto type = MemRefType::get({ 3 }, builder.getI32Type(), map);
mlir::Value view= builder.create<SubViewOp>(
					location,
					memref,
					destination,
					staticOffsets, staticSizes,staticStrides,
					dynamicOffsets, dynamicSizes, dynamicStrides);
%subview = subview %memref[%offset, 0] [1, 2] [1, 1] : memref<3x2xi32> to memref<2xi32, affine_map<(d0)[s0] -> (s0)>>
llvm/include/llvm/ADT/SmallVector.h:246: T& llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2> >::operator[](llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2> >::size_type) [with T = long int; <template-parameter-1-2> = void; llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2> >::reference = long int&; llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2> >::size_type = long unsigned int]: Assertion `idx < size()' failed.

EDIT:
I found it. BuiltinTypes.h has a nice makeStridedLinearLayoutMap method that allows to automatically create an affine map with the desired offsets and strides. ShapedType::kDynamicStrideOrOffset was enough to set the offset as dynamic.

Thank you again @nicolasvasilache for your precious help!

Grep for makestridedlinearlayoutmap.
Pass getdynamicstrideoroffset or getdynamicsize as the special value to denote ?, other values will be static values.

I’m prob going to change those getxxx methods to setxxx to mutate a value passed by reference, rather than leak special values. A lot of this unfortunate state is a remnant from prehistoric times when we actually had hardcoded -1 dynamic sizes everywhere …

1 Like