Add an expanded Load/Store Op in memref dialect

I have yet to see however how memref.offset_load/store is an improvement over reshape + load; otherwise it is just another abstraction to do the same thing ?

Is the reshape always “free” (as in will just generate a cast)? Or is it a special case of “reshape to 1D”? I can’t say I’m sure I understand how to reshape memref<?x?xf32> to memref<?xf32> with our current memref.reshape?
Unless you’re thinking of using memref.reinterpret_cast somehow?

That would be linalg.reshape which is being refactored and split into memref.expand_reshape and memref.collapse_reshape as per this RFC.

Depending on where allocation occurs, there is also the possibility to
alloc + memref.view + subview 2-D + subview 1-D. This may need to evolve depending on what alias analysis looks like for OP’s transformations.

1 Like

@linearhit another thing I am wondering is: since you operate on buffers, what does your analysis look like to allow the fusions to occur (e.g. what about control-flow and alias analysis to avoid operations that operate on the same buffer bypass each other) ?

Have you thought of (or tried) applying similar transformations in the tensor domain where SSA use-def chains give you many nice guarantees ?

An added bonus I would see operating in the tensor domain is you could have more control over the buffering scheme. The layout in memory could even potentially be choosen such that many of the linearization/delinearization are statically known to reduce to pointer increments.

For your concern on buffer related issues: lhlo_fusion happens when mhlo is just transformed to lmhlo, by this time no buffer optimizations or control flow lowering has ever been done. So we can guarantee that each buffer has only one writer in the control flow region, which is substantially similar to SSA. So currently it’s OK for now.
We will re-visit fusion in future in order to support ‘shape constraint’ and other features of shape dialect. I think it’s highly possible that fusion will be moved back to hlo by then. There were some historical reasons for a fusion pass on LHLO, sooner or later we’ll involve shape dialect and we’ll reconsider it by then.

Do you have any suggestions on which dialect to put linearize/delinearize into?
Is memref::IndexLinearizeOp & memref::IndexDelinearizeOp a good proposal?

memref::LinearizeIndexOp and memref::DelinearizeIndexOp sounds good to me. But did you conclude on the offset_load/store ops? Like @nicolasvasilache mentions, why do we need these when you can do reshape to 1-d + load. This looks natural and fits into the logical reshape abstractions that already exist. Also, when lowered, wouldn’t you get identical IR? (ptr + offset ultimately)

This reshape as I understood it is just a logical reshape of the shape. So it won’t by itself lead to memory traffic but only gets pulled into the access subscripts.

I’d expect to see a tensor equivalent at some point but memref::LinearizeIndexOp and memref::DelinearizeIndexOp SGTM too.

In the contiguous case (i.e. canonical strides), this is true and seems relatively easy.

Still, I’d expect the abstraction to work with strided memref with dynamic strides (whether we want to unpack the values at LLVM or before still TBD).

This is where it gets trickier: memref.expand_reshape and memref.collapse reshape can only manipulate contiguous dimensions (i.e. there is no representation for representing say a 4-D non-contiguous subarray with a 1-D strided memref).

For such cases, I expect memref.alloc + memref.view + memref.subview n-D + memref.subview 1-D + memref::LinearizeIndexOp + memref::DelinearizeIndexOp to do the job.

Still, I don’t expect @linearhit to have such cases yet, given the IR I have seen so far.

An alternative is to unpack strides as SSA values and manipulate them more directly with e.g. memref.stride and affine maps such as

affine_map<(i)[M, N, K] -> (i / N * K, i mod (N * K) / K, i mod K)>

but I am not confident unpacking this complexity will be easy to recover. It seems to me that representing this complexity in a more controllable structured form is the whole point of the exercise, so this seems it would defeat the purpose.

1 Like

“reshape to 1D and then load/store” should be OK for my case. Will it be better to have an explicit semantic? Not sure what do others think but i feel ‘reshape’ is somewhat counter-intuitive. It’s definitely acceptable anyway.