Strength reduction with memrefs

Hi,

Given a model that allocate memory, instantiate constants, compute the model, free the allocated memory and return the results, we are interested in doing strength reduction to move the alloc, instantiation of constants and freeing of the memory out of the critical path. In essence:

constData = prepare_model()
while (newData = getData) eval_model(newData, constData);
finalize_model(constData)

where we prepare and finalize once for many evaluations.

There are different ways to implement this: creating pointers to memrefs, or creating structs of memrefs, or creating a new alloc primitives that carves data out of an existing chunk of memory at a given offset.

There does not seem to be support for this at the present time in MLIR, and I would venture that this is a common optimization that might be of interest to many. Doing all the strength reduction at the lower dialects

If there is already support for this, please let me know what is the current ways. If there is not, maybe we can start a discussion on the alternative methods.

Thanks

Thanks for bringing this up. I would very much support growing this functionality.

MLIR is currently great for describing tensor programs within function bodies, where everything is a separate local SSA value or argument. But I think it one of the main conceptual bridges that we haven’t crossed yet in MLIR is a more sophisticated notion of a tensor-level program that includes a notion of “global variables” that can store state like this; or a notion of recursively composed types containing tensors that can be exposed to a runtime. (and similarly for memref-level programs, which those tensor-level programs get lowered into)

I’ll be talking about a similar set of issues in the ODM this week. I’m really excited about this topic of supporting a richer set of programs with tensors/memrefs!

We have a little bit of experience with this in on the TensorFlow side with the design of the tf_saved_model dialect, which supports

  1. persistent mutable global variables
  2. structured inputs/outputs on function signatures, which are represented by annotating each individual tensor argument with its logical position within the struct (such as being the tensor that one would read by doing “the_structure.foo.bar”)

This is by no means a complete and general thing, but is enough to do the transformation that you would like to do.

On the IREE side we support compiling such programs. We lower the global tensors to our own lower-level “global” ops that represent them. The structured signature is handled by encoding it into a “reflection metadata” that runtimes can use to map structures onto positional arguments.

It is not clear to me whether you want to do the reduction at the tensor-to-memref lowering level, at which point you can look into the ongoing work on buffer allocation (ping @herhut), or at the memref level.

If the latter, memref-of-memrefs (which is a better-structured equivalent of pointer-to-memref) and raw pointers are a thorny recurrent question. [RFC] Remove MemRefType element type check? Or add pointer support to ‘std’ dialect? seems to be the latest relevant discussion that contains references to previous discussions. This is totally feasible, but requires careful design and nobody ventured as far as proposing an implementation.

This sounds exactly what std.view does, https://mlir.llvm.org/docs/Dialects/Standard/#stdview-viewop. One can allocate a byte buffer and slice it into smaller memrefs, eventually taking more complex views into slices using std.subview. If it is missing some functionality, let’s try and improve the std. ops instead. @nicolasvasilache would know the most about how these are currently used.

Also, I believe @bondhugula had some code hoisting allocations out of a loop.

Glad to hear there is ongoing work and sketches of solutions on this matter. We are interested in the functionality, and have no pre-conception of where it should land.

I imagine that one of the design consideration is alias analysis. When having a common pool of memory that gets given away for individual tensors, it may confuse dependence analysis. One way to reduce impact may be to “delay” this negative impact for as long as possible within MLIR, by having an abstraction where two tensors coming from the same pool of memory are guaranteed to be distinct until it gets all lowered to the lower dialects (e.g. LLVM).

Thanks for the answers and for opening the discussion on this topic.

The plan is for us (I work with Alex) to develop a working solution for memory pooling and attach that to our own dialect (for now). It is currently work in progress. We are open to contributing it to MLIR if there is interest. Also if you guys are interested in what we are doing we can always discuss.

Since globals were also mentioned, we already have a solution (in the same dialect mentioned in the previous paragraph) for supporting global constant memrefs. These constant memrefs are lowered into LLVM global constants. Our internal PR is here: Emit constant tensors as global constants by doru1004 · Pull Request #66 · onnx/onnx-mlir · GitHub

I imagine that one of the design consideration is alias analysis.

This is another thing that needs careful design, and time that nobody has had so far… It is particularly challenging in presence of control flow, whether structured, unstructured or call-like. FWIW, Linalg has some alias analysis that is traverses alloc/view/subview chains but doesn’t cross basic block and function boundaries.

It feels like this will have been decided during buffer allocation, we may want to know about locations being reused at least at the loop level for the sake of live-in/live-out optimization, for example.

I would highly encourage you to try and formulate your work in abstract terms with traits and interface, e.g. have an allocation-like operation and an alias-like operation, connect that to the memory effects subsystem and propose a general solution.

Have you considered presenting at the open design meeting?

I am a bit confused here to see “aliasing” and “tensor” considered at the same time, are you using “tensor” as a generic term and not referring to the “tensor type” in MLIR? I see tensor as a SSA/immutable value type, which seems incompatible with any notion of aliasing at this level of abstraction.