Structured algebra operations

Hi all,
This is a high level enquiry about linalg/mlir features availability. We are trying to generate BLAS level3 functions like strmm (triangular matrix multiply).

The few high level questions I have are:

  • Has something like this been already tried?
  • Do we have support in linalg/mlir for “triangular” loops (i.e., where we don’t consider all the matrix)? By support I mean code generation + optimizations.
  • Do we have support in linalg/mlir for “triangular” packed storage (i.e., instead of storing all the matrix, we store only half of it and use an affine map to access the elements)?

Thanks a lot in advance,

There is nothing concrete yet; atm we have quite restricted dense hyper-rectangular types and much more general sparse types. Triangular and ragged are part of our wishlist but no concrete effort has started yet.

Contributions are most welcome ! :slight_smile:

Hi Nicolas,
I was hoping in your reply :slight_smile: Yes, we would like to contribute in this sense. Could you help me build an high level view of what needs to be done?

For instance, from the storage point of view, do we need to introduce properties to tensors? Or maybe create a rectangular tensor? How would this map into memrefs?

From the computation point of view, should we create a new linalg operation? E.g., linalg.structured (as opposed to linalg.generic)?

Thanks for any direction/help,

Hi @nicolasvasilache , all,
following up on this, I am in the very early stages and trying to draft a small plan, starting with trmm.

From here it looks like we can reuse many things in linalg (they basically subdivide the triangular matrices in small rectangular blocks and take care of the iterations end points).

Few warm-up questions:

  • I am trying to add a linalg.triangular operation (as opposed to linalg.generic). Is this the right way? Or should I add an attribute to linalg.generic to indicate that those are triangular matrices?

  • My first goal would be to have a --convert-linalg-to-loops that emits right loop nest in the triangular case. I tentatively added a createLoopRangesTriangular to the LinalgStructuredInterface to hack my way in. Is that ok? Or should I add a new Triangular interface which maybe inherits from LinalgStructuredInterface (or any other ideas)?


cc @chelini

I am out until May 3rd so please expect low to zero bandwidth from me.

The step forward here IMO would be to add support for a new triangular shaped type.

Attributes could be experimented with to get the first pieces connected but shouldn’t land by themselves as they are quite unlikely to connect all pieces properly.

New ops for this specific purpose are a nonstarter IMO but may be useful for quick early prototyping.

In this context, I played with attributes similar to what is available in the sparse dialect with the sparseTensorEncoding. For example, the pseudo IR below multiples a lower triangular matrix with a general one.

%0 = my_dialect.alloc [10, 10] : !my_dialect.matrix<<["lowerTri"]>, [10, 10], f32>
%1 = my_dialect.fill(%0, %constantX) : !my_dialect.matrix<<["lowerTri"]>, [10, 10], f32>, f32 -> !my_dialect.matrix<<["lowerTri"]>, [10, 10], f32>

%2 = my_dialect.alloc [10, 10] : !my_dialect.matrix<<["general"]>[10, 10], f32>
%3 = my_dialect.fill(%2, %constantY) : !my_dialect.matrix<<["general"]>[10, 10], f32>, f32 -> !my_dialect.matrix<<["general"]>[10, 10], f32>

%4 = my_dialect.mul %1, %3 : !my_dialect.matrix<<["lowerTri"]>, [10, 10], f32>, !my_dialect.matrix<<["general"]>[10, 10], f32> -> !my_dialect.matrix<<["general"]>[10, 10], f32>

Now you can lower mul directly to a triangular loop or convert it to a linalg.generic and customize the body of the operation by still having a rectangular iteration domain but guarding the computation in the body with a conditional. I think this is the easiest way to get started but using attributes makes it difficult to connect with linalg properly. Having a new triangular shape type would be nice.