[RFC][MemRef] Adding Realloc Op

This proposal is a direct result of the [RFC] Structured Codegen Beyond Rectangular Arrays. Also see the MLIR Sparse Compiler Progress thread.

To deal with unknown sparsity, we need expandable memory for 1D memory buffers, as evidenced by the use of vector in the current support lib.

At this time, MLIR has memref.alloc for allocating a memory region but doesn’t have a corresponding op for changing the size of the memory region. This RFC proposes to add memref.realloc for changing the size of a memory region. Alternatively, we could implement the functionality via memref.alloc/view/copy/dealloc without introducing memref.realloc. The memref.view op is needed because memref.copy requires the source and destination memref to have the same shape. We choose to introduce memref.realloc for the following reasons: (1) more efficient implementation, such as avoiding the overhead of constructing memref.view. (2) better support for expandable 1D memory buffers, similar to the idea of having realloc in the C stdlib.

ReallocOp

The realloc operation changes the size of a memory region. The memory region is specified by a 1D source memref. The size of the new memory region is specified by a 1D result memref type and an optional value for the dynamic dimension size. The source and the result memref must be in the same memory space and have the same element type.

The operation may move the memory region to a new location. In this case, the content of the memory block is preserved up to the lesser of the new and old sizes. If the new size is larger, the value of the extended memory is undefined (this is aligned to the ISO C standard for realloc). A PR for adding the op is D133424 .

Example:

    %0 = memref.realloc(%src) : memref<64xf32> to memref<124xf32>

If the source dimension is dynamic, the compiler generates code to retrieve the source dimension size from the runtime data structure of the memref implementation.

    %1 = memref.realloc(%src) : memref<?xf32> to memref<124xf32>

On the other hand, if the result dimension is dynamic, a result dimension operand is needed to specify the size of the result tensor. In the example below, the ssa value %d provides the unknown dimension of the result memref.

    %2 = memref.realloc(%src, %d) : memref<64xf32> to memref<?xf32>

The optional alignment attribute may be specified to ensure that the region of memory that will be indexed is aligned at the specified byte boundary. This is consistent with the fact that memref.alloc supports such an optional alignment attribute. Note that in ISO C standard, neither alloc nor realloc supports alignment, though there is aligned_alloc but not aligned_realloc.

    %3 = memref.realloc(%src, %d) {alignment = 8} : memref<64xf32> to memref<?xf32>
2 Likes

Memref supports layout, it’s not clear to me what’s the behavior of this operation in this case?

What does 1) refers to? Also since you only support 1D memref, it seems that some views are needed to get to this 1D view from any non-1D memref already?

The realloc is equivalent to alloc+copy+dealloc right? What is the tradeoff that makes introducing a new op better here?

I forget to mention that the new op doesn’t support layout, which is similar to memref.view.

You can’t express “memref realloc” via memref.alloc+copy+dealloc because memref.copy only works for two memref of the same type. We need to use memref.view, to create a view of the bigger buffer between the source tensor and the destination tensor and use it for the copy op. This implementation is not as efficient as introducing a new op.

Very good question!

For high-level/denotational semantics, yes realloc is equivalent to alloc+copy+dealloc. The difference is for low-level/operational semantics: under certain circumstances, realloc allows to avoid the copying step and to simplify the work associated with (de)allocation —exactly the same as the situation in C/C++.

As for whether the performance is worth introducing a new op, that’s a different question. We had a lot of discussion back and forth on that while drafting the RFC, and while we came down on the side of wanting the new op, it’s not a landslide in either direction.

1 Like

Implementation efficiency has nothing to do with using a view IMO.
I can certainly see arguments related to memory management, OS and allocators in general (realloc that shrinks is a noop); e.g. Differences between using realloc vs. free -> malloc functions

I do not understand how this would work: I thought the point was to lower to a libc realloc but that does not support alignment ?

If you end up lowering to your own alloc/view/free/realloc this seems to defeat the purpose of a new op
Do you have a 3rd option to make alignment work?

1 Like

Want to make it clear that there are three possible solutions mentioned in the discussion here:
(a) do not introduce a new op. Instead, use MLIR memref.alloc/view/copy/dealloc
(b) introduce memref.realloc. Use “the system realloc” to implement the op.
(c) introduce memref.realloc. use “the system aligned_alloc(or malloc), memcpy, free” to implement the op.

We took approach (c) as shown in the preliminary PR D133424. We didn’t choose approach (b), because the current memref implementation supports alignment. When the “unaligned_ptr” and “aligned_ptr” of a “memref object” aren’t equal, we can simplify pass one of the two pointers to the system “realloc”. WDYT?

Implementation-wise, I think that when alignment isn’t specified then we should do (b), to maximize the possible benefits. It’s only for the case where alignment is specified that we’re forced to do (c).

It doesn’t make sense to introduce a new op if we only do (c) as it will trick the caller into believing that the libc realloc optimization may happen, but it really would not. Maybe we should even drop the alignment attribute from this op and always do (b), and document that if the “aligned realloc” behavior is desired, it can only be implemented as (a).

Regarding the interface of the op, not supporting layouts is fine, I think we don’t support them either when lowering memref.alloc to the LLVM dialect. There were also some concerns that certain affine map layouts, namely dimension-reducing ones, don’t allow one to unambiguously compute the allocation size. I have one question: the majority of the current use cases seem to be memref.alloc ... : memref<... x i8> that are later memref.viewed to have a different elemental type. Having realloc with elemental types seem to diverge from this practice. What’s your opinion on this?

Maybe we should even drop the alignment attribute from this op and always do (b), and document that if the “aligned realloc” behavior is desired, it can only be implemented as (a).

I totally agree with this.

the majority of the current use cases seem to be memref.alloc ... : memref<... x i8> that are later memref.view ed to have a different elemental type. Having realloc with elemental types seem to diverge from this practice. What’s your opinion on this?

Note that although realloc does not support alignment, by ISO C standard: the pointer returned if the allocation succeeds shall be suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object in the space allocated (until the space is explicitly freed or reallocated). With that being guaranteed, it would safe to view a reallocated memory into different elemental types as it will be properly aligned too. I think we can follow the same design principle here as memref.alloc, but dropping the support for layout and alignment.

1 Like

We want to support aligned realloc, for the same reason as we set the alignment here.

On an offline conversion, @ftynse seemed to agree that: the viable reason to introduce a new op for this purpose is to have matching/rewriting on this op. Otherwise emitting an alloc/view/copy/free sequence has no significant disadvantages.

For convenience, if nothing else. memref is typed, isn’t it? memref.alloc supports general element types, it would be nice if memref.view and memref.realloc also support general element types. Otherwise, users have to create i8 memref using alignment attributes to get the needed alignment for the element types they have in mind, then create view to use the memref.

1 Like

ISO C standard has no knowledge of MLIR types, so it won’t be able to produce the alignment necessary for, e.g., vector<13 x bf16> or any other vector type for that matter. Vector types are the reason for memref.alloc lowering to do extra work for alignment.

Providing an explicit alignment may actually be desirable for two reasons: (1) one have alignment than what the type would “naturally” require for hardware-related purposes (make sure something goes into a single cache line, improved performance on some architectures that allow unaligned loads but with a performance penalty); and (2) avoid second-guessing what the “natural” data layout-imposed alignment would be for non-trivial types. These were the reasons to have the alloc/view/subview model. Without alloc i8, view isn’t actually necessary and can be subsumed by subview.

So, are you planning to match/rewrite the op? :slight_smile:

1 Like

Right, my PR to support this RFC is D133424.

I think it is fair to say that currently the main reason we would like to introduce this operation is to simplify the code coming out of the new sparse code generation (which replaces the conversion into support library calls). For example, when introducing “push_back()” flavored constructs, testing against size followed by a relloc yields shorter code than having to generate alloc/copy/dealloc every time. We contemplated simply introducing a new op for this, which later lowers into such a sequence, but then realized that such an op would look a lot like realloc, and a true realloc operation may be worthwhile to the community at large, also with the added, granted smaller, benefit of allowing for in-place size reductions and even size enlargement when free buffers allow. That is why we went with the RFC for a new memref op, to see if others would like to see this happen.

Beyond that, I don’t think we have any clear match/rewrite plan, although I could see that happen when the realloc operation gains popularity.

So at this point, we could simply want to know if it is a go (yes please) or no go (fair enough, then we will resort to the short lived lowering op alluded to above).

2 Likes

I think this op creates an issue with respect to memory effects because it may allocate. You won’t be able to attach the Allocate effect to the op, and without a memory effects interface, passes and utilities generically written to use such effects as opposed to hardcoding the list of allocating ops wouldn’t do anything useful on them. Either a new MayAllocate effect may have to be introduced, which may in turn make things harder (benefits vs cost of handling it everywhere) or it’s better to just go with canonicalizing alloc + copy + dealloc into realloc late if needed.

Just curious, in what context would that subtle difference matter? Isn’t a non-resizing reallocation conceptually the same as a reallocation that happens to land at exactly the same address and without touching any memory?

2 Likes

Consider it from the standpoint of memory effects. If there isn’t a difference, why aren’t there any memory effects attached to memref.realloc? Shouldn’t it then have both a Free and Allocate effect attached to it?

Yes it should have both Free and Allocate.

2 Likes

Having a conditional, e.g. scf.if that has recursive side effects, where one branch allocates and another does not isn’t fundamentally different with respect to side effects. It may or may not allocate depending on which branch is taken.

2 Likes

Something to keep in mind as well I believe is that the Effect can be associated with an operation or instead with a particular SSA value: it can be different know that a particular operation will “allocate” vs “this particular SSA value is allocated by this operation”.