Issues with memref operations

Hi!

I’ve faced with test failues in my application after update to latest MLIR and did some investigation. I’ve found several issues with memref.extract_aligned_pointer_as_index, memref.extract_strided_metadata and memref.view operations, which caused those test failues, and I’d like to discuss them.

Here is my scenario. I have an IR with multiple constant 1D memrefs:

%cstA = memref.get_global @A: memref<4xi64>
%cstB = memref.get_global @B: memref<4xi64>

I’d like to get data address of them and pass it to external function. I’m using extract_aligned_pointer_as_index and extract_strided_metadata for that to be more generic in terms of offsets:

%cstA_base_ptr = memref.extract_aligned_pointer_as_index %cstA : memref<4xi64> -> index
%base, %cstA_offset, %size, %stride =
    memref.extract_strided_metadata %cstA :
      memref<4xi64>, index, index, index
%cstA_final_ptr = index.add %cstA_base_ptr, %cstA_offset
func.call @foo(%cstA_final_ptr)

// same for cstB

Finally, I have a separate pass, which merges all constants into single one of type i8 and replace separate memref.get_global with memref.view into it:

%merged_cst = memref.get_global @merged: memref<64xi8>
%cstA = memref.view %merged_cst [%offset_0][] : memref<64xi8> to memref<4xi64>
%cstB = memref.view %merged_cst [%offset_32][] : memref<64xi8> to memref<4xi64>

// code with extract_aligned_pointer_as_index and func.call

But, what’s happening between this passes? memref.extract_strided_metadata has folder, which replaces its results with constant values, when offsets and/or strides are compile-time known constants. This is true for original IR and the folding mechanism removes extract_strided_metadata and pointer arithmetic with offset and leaves only:

%cstA_base_ptr = memref.extract_aligned_pointer_as_index %cstA : memref<4xi64> -> index
func.call @foo(%cstA_base_ptr)

After the constant merging pass I’ve got the following IR:

%merged_cst = memref.get_global @merged: memref<64xi8>
%cstA = memref.view %merged_cst [%offset_0][] : memref<64xi8> to memref<4xi64>
%cstB = memref.view %merged_cst [%offset_32][] : memref<64xi8> to memref<4xi64>

%cstA_base_ptr = memref.extract_aligned_pointer_as_index %cstA : memref<4xi64> -> index
func.call @foo(%cstA_base_ptr)

%cstB_base_ptr = memref.extract_aligned_pointer_as_index %cstB : memref<4xi64> -> index
func.call @foo(%cstB_base_ptr)

Now, the memref.extract_aligned_pointer_as_index is called on the result of memref.view and by its definition it returns not the pointer to the start of particular constant data, but the base pointer to the %merged_cst. This becomes more visible after -expand-strided-metadata pass, which just swaps memref.extract_aligned_pointer_as_index and memref.view and both function calls get the same pointer.

Thus, the first issue, from my point of view - the memref.extract_strided_metadata shouldn’t perform implicit constant folding via common folding mechinsm, since the IR might be modified by further passes and this modifications might introduce different offsets and strides.

The second issue - memref.view generates memref result memref with empty layout map, which is equvivalent to compact strides and zero offset. But that’s not actually true, because we do have offset from the base buffer. It should look like:

%cstB = memref.view %merged_cst [%offset_32][] 
          : memref<64xi8> to memref<4xi64, strided<[1], offset: 4>>
//                                                      ^^^^^^^^^

Funny thing - why it works for me prior to update to latest upstream. I has a --convert-linalg-to-llvm pass in my pipeline, which was called prior to -expand-strided-metadata. This pass internally used finalize-memref-to-llvm patterns, so the this part were converted to LLVM dialect here and not in -expand-strided-metadata/--finalize-memref-to-llvm calls below in the pipeline. Accidentelly, finalize-memref-to-llvm patterns lowers pair of memref.view -> memref.extract_aligned_pointer_as_index into correct code and I’ve got valid pointers in the function calls. After update to latest upstream I’ve removed --convert-linalg-to-llvm call and the issue appreared.

All this discoveries brings me to more generic question - why we don’t have a simpler way to get pointer to the first element of memref value? It would be more simpler in the cases, when we want to pass memref data into some external function without need to deal with base pointers and offsets.

Sorry for such long topic :slight_smile:

I think the pipeline is behaving as expected here, the problem is that by constant folding and then replacing the source expression of the fold, we break the expressed semantics.

If we reduce the example to something akin to:

x = 144 : i32
b = sizeof(x)
...

Then we constant fold, and b becomes 4. If we now shrink the type of x to i8, we have successfully broken the compilation.

I’m afraid that when changing the memref after folding the metadata, we run into this exact same issue.

Yes, I agree. I would prefer to have this folding explicit and controllable. Right now it is done via generic folding mechanism, which is called during canonicalization and greedy pattern driver. If it was implemented via some separate pass, users might call ir after all other passes, which perform transformations affecting metadata.

I believe, there already was such discussion in the topic related to extract_strided_metadata prior to its introduction in code base. But for some reason, it has generic folding mechanism.

memref.extract_aligned_pointer_as_index already returns the equivalent of %base + %cstA_offset I believe? It don’t understand the meaning of index.add %cstA_base_ptr, %cstA_offset right now.

There is a design problem with your approach here I believe: if something is “compile-time” known, you can’t just change it in a pass, it’s too late. It should not be a compile-time constant if you want to have flexibility around it.

That’s not clear to me, I don’t know what the thinking was around it but memref should embed a memref independently of the strided layout annotation. That is the offset from the base pointer that is embedded in the memory (part of the descriptor), is not exposed in the type, and is something that is applied before considering the layout. The layout applies on the “aligned pointer”.
I wonder how @ftynse or @nicolasvasilache see this?

That works only with compact memref, since the pointer can’t carry the slides.
But that’s what memref.extract_aligned_pointer_as_index provides I think?

But see also the Bare Pointer Calling Convention for Ranked MemRef.

Thanks for tagging @mehdi_amini (also @qcolombet in addition to @ftynse ).

What I am seeing here is a use of memref.view that I find a little surprising, I would have expected that we would first:

%merged_cst = memref.get_global @merged: memref<64xi8>
%cstA = memref.view %merged_cst [%offset_0][] : memref<64xi8> to memref<8xi64>

and then take 2 subviews of 4xi64
Still the system should not be that surprising and there is likely some bug involved; the memref.view operation has certainly seen less mileage than other abstractions.

its definition is to return the “aligned pointer” not the base pointer to the %merged_cst. This is consistent with the lowering to LLVM: https://github.com/llvm/llvm-project/blob/4210204f521be3caa0e60bd596af3444cbd44d04/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp#L1797
If you are getting the same pointer value in both cases this seems indeed like a bug to me.

I am not sure where yet, the lowering of view op correctly applies the byte offset to the aligned_ptr with a GEP (https://github.com/llvm/llvm-project/blob/4210204f521be3caa0e60bd596af3444cbd44d04/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp#L1688).

See my response above, the result is indeed a 0-offset memref because the aligned_ptr “swallows” the offset. This is the current design of the op; it is done this way because the byte_offset may well not be divisible by the target element type and you can’t always put a target memref offset (e.g. shift by 3 and result datatype is f32 would give you misalignments).

I will have to investigate more the real source of the issue but offhand the description of the problem seems wrong to me and may be caused by a bug (e.g. it is fishy to me that:

the memref.extract_aligned_pointer_as_index is called on the result of memref.view and by its definition it returns not the pointer to the start of particular constant data, but the base pointer to the %merged_cst. 

Do you have a standalone MLIR test, ideally running with mlir-cpu-runner and some printfs, before and after your transform that I could run and debug?

1 Like

The ConvertExtractAlignedPointerAsIndex pattern from MemrefToLLVM pass lowers memref.extract_aligned_pointer_as_index to access of the second field of the underlying memref C structure (see https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp#L1784).

From the documentation (LLVM IR Target - MLIR) this pointer doesn’t point to the first element in the generic case:

  1. The pointer to the properly aligned data pointer that the memref indexes, referred to as “aligned pointer”.
  2. A lowered converted index-type integer containing the distance in number of elements between the beginning of the (aligned) buffer and the first element to be accessed through the memref, referred to as “offset”.

And that’s how ExecutionUtils deals with memref lowered structure (https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/ExecutionEngine/CRunnerUtils.h#L168):

template <typename T>
struct StridedMemRefType<T, 1> {
  T *basePtr;
  T *data;      // <-- memref.extract_aligned_pointer_as_index returns this field
  int64_t offset;
  int64_t sizes[1];
  int64_t strides[1];

  T &operator[](int64_t idx) { return *(data + offset + idx * strides[0]); }
  //                                    ^^^^^^^^^^^^^
  // We have to add 'offset' field to 'data' pointer to get first element.
};

Why? The memref.view documentation ('memref' Dialect - MLIR) says:

The “view” operation extracts an N-D contiguous memref with empty layout map with arbitrary element type from a 1-D contiguous memref with empty layout map of i8 element type. The ViewOp supports the following arguments:

  • A single dynamic byte-shift operand must be specified which represents a a shift of the base 1-D memref pointer from which to create the resulting contiguous memref view with identity layout.

So, I believe my use case is valid:

%buf = memref.alloc() : memref<64xi8> // in my example it was 'memref.get_global', but it doesn't matter
%view0 = memref.view %buf[%offset_0][] : memref<64xi8> to memref<4xi64>
%view1 = memref.view %buf[%offset_32][] : memref<64xi8> to memref<4xi64>

Here is another example - the pass expand-strided-metadata. It contains a pattern, which replaces extract_aligned_pointer_as_index(view_like(%mem)) with extract_aligned_pointer_as_index(%mem):

func.func @test() -> index {
    %alloc = memref.alloc() : memref<64xi8>

    %offset_32 = index.constant 32
    %view = memref.view %alloc[%offset_32][] : memref<64xi8> to memref<4xi64>

    %intptr = memref.extract_aligned_pointer_as_index %view : memref<4xi64> -> index

    return %intptr : index
}

// RUN: mlir-opt -expand-strided-metadata

func.func @test() -> index {
    %alloc = memref.alloc() : memref<64xi8>
    %intptr = memref.extract_aligned_pointer_as_index %alloc : memref<64xi8> -> index
    return %intptr : index
}

So, either memref.extract_aligned_pointer_as_index %view == memref.extract_aligned_pointer_as_index %alloc, which means that the memref.view operation must create memref with non-zero offset. Or this pattern is wrong and it performs non-equal IR transformation.

It indeed does not crash the verifier today but it is still a bit surprising to me as I did not mean to give this op slicing capabilities when I introduced it: it was only meant to give a flat buffer of bytes an initial indexing structure that we can subsequently slice with memref.subview.
But your usage does not seem problematic indeed, which is why I wrote “the system should not be that surprising and there is likely some bug involved;”.

Thanks for narrowing this down! Indeed, this seems like a bug in the folding pattern given the definition of the view op:

Returns a contiguous memref with 0 offset and empty layout.

@qcolombet could you please take a look?

template<typename T, size_t N>
struct MemRefDescriptor {
  T *allocated;
  T *aligned;
  intptr_t offset;
  intptr_t sizes[N];
  intptr_t strides[N];
};

Just interesting, why one operation (memref.view) adjusts aligned field and sets offset to 0, while other operation (memref.subview) copies aligned and adjusts offset? What’s the reason to have separate offset field in the lowered memref descriptor instead of adjusting aligned pointer in the same was as memref.view does it?

I don’t think it adds offset.

memref.view was originally intended to create views with a specific elemental type from an allocation that always produces a memref of i8, memref.subview was originally intended to take strided views of the view produced by memref.view and cannot change elemental types.

The separate offset field, in particular its static representation, was intended for aliasing analysis. Same as the separation of view/subview to indicate strict aliasing. I don’t think the analysis has materialized, but other additions to the memref dialect may have rendered the distinction moot. I remember considering removal of the offset field from the descriptor, but every time I found a way to convince myself it was still necessary. (However, we could replace the “allocated” pointer with another offset between the aligned and allocated pointer to avoid the weird situation with noalias between memrefs).

side note: please don’t, we are (ab)using ‘allocated’ pointer to store control block pointer for reference-counted merefs :slight_smile:

I would be interested to understand how/why indeed! I would think that manipulations of the aligned pointer should be enough? Is there something about not knowing the element size that prevents us from eliminating the offset?

This is interesting! However you likely need to figure out a way to upstream this if you want it to be actually supported…

Should I submit a bug on GitHub for that issue?

2 Likes

We did it for interop with numba/numpy runtimes and I don’t think it’s upstreamable in current form as it uses numba-specific allocation functions:

But the only change in upstream needed was flag to disable built-in alloc lowering patterns

The rest of memrefToLLVM conversion works just fine with custom allocation.

Done - [BUG][MLIR] expand-strided-metadata pass changes IR semantic · Issue #64334 · llvm/llvm-project · GitHub