[RFC] Extend Linalg named operations for arbitrary element types

This is an early-stage RFC whose purpose is to find out whether
support for arbitrary element types in Linalg named operations
is wanted by the community at all (or if the limitation to the currently
supported types is intentional) and to discuss possible directions to
add support for arbitrary types.

Background

Linalg named oeprations operations are currently limited to tensors
and memrefs composed of floating point, integer or index elements
and using any other element type triggers an assertion.

Checks for these types are hardcoded, but thanks to the abstractions
used for the definition and implementation of Linalg named operations, only
few places need to be modified to extend the set of supported element
types. In particular, supporting a new type requires a modification of
the helper methods from RegionBuilderHelper in
mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp, which are invoked by the
region builders generated from the abstract YAML representation for
proper creation of appropriate scalar operations.

However, while this might be a viable approach for a small set of
built-in types, modifying Linalg itself does not seem reasonable for
users of MLIR that wish to add support for their own, externally
defined types.

Implementation goals

A better approach would allow users of MLIR to add support new types
without modification of the MLIR code base. Ideally, the
implementation supports:

  • Modularity: allow users of MLIR to add support for their own types
    without modifying Linalg itself.

  • Extensibility: allow for the support of any element type for any
    Linalg named operation, as long as the scalar operations to implement the
    Linalg operation with the element type exist (e.g., if an addition
    and a multiplication operation exist for a scalar type, then it
    should be possible to add support for the type for linalg.matmul).

  • Selective extensibility: Since there is a variety of Linalg named
    operations, supporting a new element type for all of them would
    result in a substantial set of required scalar operations. This set
    might not be available for a given type (e.g., the max operation
    missing for complex values). Therefore, it should be possible to add
    support only for a subset of Linalg named operations for a new type.

Proposed direction for implementation

Add a new type trait for each arithmetic operator required by Linalg
and modify the helper methods of RegionBuilderHelper, such that they
create the type-specific operation via a method of the type trait. For
example, RegionBuilderHelper::arithfn__add would check that the type
implements the Addition type trait and call a method that creates
the appropriate operation for the given operands.

The drawback of this implementation is that it requires a substantial
amount of new traits, which all need to be implemented for the
built-in floating point, integer and index types.

1 Like

Can you clarify if you are mainly discussing support for named operation of if the generic op is currently limited as well?

Thanks Mehdi for the quick feedback. This is about the limitation of Linalg named operations. To my knowledge, linalg.generic does not restrict the element type.

I edited the title and contents of the post for clarification.

1 Like

Thanks for the proposal, it sounds like an interesting OpDSL/named op extension.

Using type interfaces sounds like a viable approach. I would probably implement one type interface that can generate addition, multiplication, etc. and return a failure if a certain operation is not supported. Extending the existing casting logic to support conversion between arbitrary types may be hard but is probably not required.

One thing to consider is that OpDSL currently has two lowering paths. One is used to generate a yaml file that defines the named operations (this lowering path uses RegionBuilderHelper). On the other hand, there is also a lowering path in Python itself that directly generates a generic operation. That means we always need to consider both code paths when doing these changes (llvm-project/emitter.py at 3cf86c36112fd1b059c8aead3d04656c542195ce · llvm/llvm-project · GitHub implements the Python RegionBuilderHelper).

Do the custom types you have in mind always consist of two, as in case of complex, or more built-in types. If this is the case, we may also think about making OpDSL itself extensible in the sense that a user can inject custom types assembled from multiple built-in types. OpDSL could then emit multiple built-in operations for single custom operation. The difficulty is probably accessing the built-in types within a custom type. For example, if an operation takes a tensor of complex values as an input, we need to know how to access real and imaginary parts.

Using type interfaces sounds like a viable approach. I would
probably implement one type interface that can generate addition,
multiplication, etc. and return a failure if a certain operation is
not supported.

This certainly reduces the number of required traits. However, the
downside is that this mixes potentially unrelated operands in a rather
large trait (there are already 10+ operators used by Linalg named
ops), which is also likely to be extended with the arrival of new
named operations. The latter might turn out to be problematic, as this
burdens either the developer contributing the new named operation or
the maintainers of the scalar types with the modification of all types
shipped with MLIR. A default value for each arithmetic operator,
indicating that it is not supported, takes away the pressure of
immediate implementation, but would require an extra state “not
implemented” in addition to “supported” and “not supported” for clear
semantics. However, my feeling is that “not implemented” should be
indicated with the absence of the implementation of a trait rather
than in an additional layer of abstraction.

Extending the existing casting logic to support conversion between
arbitrary types may be hard but is probably not required.

The casting logic to convert the operands of a scalar expression to a
common type attempts to cast into a single direction defined by the
YAML specificain of the operation (e.g., for matmul, it attempts to
cast both operands of the addition to the element type of the output
operand). This might be addressed with a cast trait providing a method
attempting to perform a cast to the desired type and simply failing if
the cast cannot be performed.

One thing to consider is that OpDSL currently has two lowering
paths. One is used to generate a yaml file that defines the named
operations (this lowering path uses RegionBuilderHelper). On the
other hand, there is also a lowering path in Python itself that
directly generates a generic operation. That means we always need to
consider both code paths when doing these changes
(llvm-project/emitter.py at 3cf86c36112fd1b059c8aead3d04656c542195ce
· llvm/llvm-project · GitHub 1 implements the Python
RegionBuilderHelper).

I’ll look into this, thanks for pointing this out!

Do the custom types you have in mind always consist of two, as in
case of complex, or more built-in types. If this is the case, we may
also think about making OpDSL itself extensible in the sense that a
user can inject custom types assembled from multiple built-in
types. OpDSL could then emit multiple built-in operations for single
custom operation. The difficulty is probably accessing the built-in
types within a custom type. For example, if an operation takes a
tensor of complex values as an input, we need to know how to access
real and imaginary parts.

The types I have in mind are completely opaque to MLIR and manipulated
only through library calls emitted upon lowering to the LLVM
dialect. Dealing with types only through a well-defined interface
implemented with traits would allow for a very generic solution
supporting such use cases.

Furthermore, I fear that exposing the details required to deal with
compound types composed of built-in types adds unnecessary
complexity. Encapsulating that logic into an operation exposed through
the respective trait looks like the best solution to me.

CC @nicolasvasilache @pifon2a @MaheshRavishankar based on top contributors to mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp

This certainly reduces the number of required traits. However, the
downside is that this mixes potentially unrelated operands in a rather
large trait (there are already 10+ operators used by Linalg named
ops), which is also likely to be extended with the arrival of new
named operations. The latter might turn out to be problematic, as this
burdens either the developer contributing the new named operation or
the maintainers of the scalar types with the modification of all types
shipped with MLIR. A default value for each arithmetic operator,
indicating that it is not supported, takes away the pressure of
immediate implementation, but would require an extra state “not
implemented” in addition to “supported” and “not supported” for clear
semantics. However, my feeling is that “not implemented” should be
indicated with the absence of the implementation of a trait rather
than in an additional layer of abstraction.

I was thinking of replacing the arithfn__add, arithfn__mul, etc in the RegionBuilderHelper by something similar to the following code snippet:

auto arithOpBuilderInterface =
  lhs.getType().dyn_cast<ArithOpBuilderTypeInterface>();
if (!arithOpBuilderInterface)
  // unsupported type
FailureOr<Value> result = 
  arithOpBuilderInterface.create(builder, "add", lhs, rhs);
if(failed(result))
  // unsupported operation

An enum instead of passing the operation type by string would trigger an earlier error in the generation process and may be nicer. If a new operation is added, only the types that want to support it need to update their type interface implementation.

Alternatively, we may generate one interface for every operation/function as you suggest. I think the code would look very similar? The main difference between the two solutions seems to be where the operation/function dispatch happens?

Let me know if I missed an important point here and feel free to post some pseudo code to show your approach.

I’ll look into this, thanks for pointing this out!

It probably requires supporting type interfaces in Python. The documentation indicates they are not yet supported.

The types I have in mind are completely opaque to MLIR and manipulated
only through library calls emitted upon lowering to the LLVM
dialect. Dealing with types only through a well-defined interface
implemented with traits would allow for a very generic solution
supporting such use cases.

Ok, then the type interface approach makes much more sense.

An enum instead of passing the operation type by string would
trigger an earlier error in the generation process and may be
nicer. If a new operation is added, only the types that want to
support it need to update their type interface implementation.

Thanks for pointing me to Type interfaces. I was only aware of bare
type traits, which, AFAIU, cannot be defined externally. My intial
concern with the bare type traits implementation was that each type
would have to implement the traits explicitly via the type definition,
resulting either in a large list of traits when using one trait per
arithmetic operator, or one big trait that mixes operators which are
only related by their use in linalg named operations.

However, having a type interface for Linalg scalar operations (e.g.,
LinalgArithOpBuilderTypeInterface), which can be added externally for
the builtin types, solves all these issues, since it makes clear
how/why the operators are related and keeps the maintenance for the
builtin types local to the Linalg code.

I also agree that using an enum is preferable here. In addition to
that, I’d suggest to have three different outcomes for a call to
LinalgArithOpBuilderTypeInterface::create:

  • A default value “Not implemented”: a sensible implementation of the
    operator might exist, but hasn’t been implemented (e.g., due to an
    recent extension of the set of operators in
    LinalgArithOpBuilderTypeInterface that hasn’t been taken into
    account yet for the type implementing the interface)

  • A value “Not supported”, indicating that there is no sensible
    implementation of the operator for the type)

  • An mlir::Value representing the outcome of the operation if the operators
    is supported and implemented

Alternatively, we may generate one interface for every
operation/function as you suggest. I think the code would look
very similar? The main difference between the two solutions
seems to be where the operation/function dispatch happens?

Using type interfaces I prefer indeed using a single interface. As pointed
out above, I was reasoning in terms of bare type traits when I suggested
using separate traits.

It probably requires supporting type interfaces in Python. The documentation
indicates they are not yet supported.

Thanks for the heads-up! Hopefully that’s not a showstopper for now.

Thanks for pointing me to Type interfaces.

I don’t have extensive experience with them myself but I think they would be the method of choice here :).

I’d suggest to have three different outcomes for a call to
LinalgArithOpBuilderTypeInterface::create

Yes having three return values sounds fine. I used FailureOr because it is convenient. We could also have additional methods on the interface to check if an operation is not supported / not implemented. That is a design detail though.

Thanks for the heads-up! Hopefully that’s not a showstopper for now.

I don’t think so. I guess a good start is to go step by step and maybe start off with an interface first, using it on the C++/yaml side, etc.

Sounds good. I’ll experiment a bit with Type Interfaces and then put a patch together.

Crated and submitted the patch. The review is at ⚙ D118022 [mlir][linalg] Add support for arbitrary element types for named operations.

Thanks for the RFC and proposed PR.

After going over the proposed impl, I have some concerns over the sheer complexity involved for something that seems quite simple on the surface, so maybe I am missing something more fundamental.

First, any new contribution related to named ops should start being included in a Frontend subdirectory.
In practice this is really useful to improve the programming model abstraction (and orthogonally to match patterns).
It is however less useful when coming form a higher-level programming model such as XLA or TOSA.

Second, it seems that what you really want is a frontend-oriented LinalgNamedOpInterface or LinalgFrontEndOpInterface which exposed attributes (e.g. add).
The attributes would capture the operation name you want at the instance level.

You would have an IR resembling:

linalg.matmul add="my_dialect.my_fancy_add" mul="my_dialect.my_fancy_mul"
  ins(%A: tensor<!my_fancy_type> ...) outs(...)

This should allow to be extensible etc without a single line of C++ once the basic flow is set up.
You should be able to use the generic op creation from a state + operation name to build your ops.

Further issues with the current proposed PR are that it moves the traditional builder logic to a subset through an enum. This is not extensible in general.
In the very general extensibility case with control-flow etc, the probably best solution would be just a function symbol. But we are not there yet.

Thanks @nicolasvasilache for sharing your thoughts on the RFC.

After going over the proposed impl, I have some concerns over the
sheer complexity involved for something that seems quite simple on the
surface, so maybe I am missing something more fundamental.

First, any new contribution related to named ops should start being
included in a Frontend subdirectory. In practice this is really
useful to improve the programming model abstraction (and orthogonally
to match patterns). It is however less useful when coming form a
higher-level programming model such as XLA or TOSA.

I assume this should be a subdirectory of
mlir/{include/mlir,lib}/Dialect/Linalg.

Second, it seems that what you really want is a frontend-oriented
LinalgNamedOpInterface or LinalgFrontEndOpInterface which exposed
attributes (e.g. add). The attributes would capture the operation
name you want at the instance level.

You would have an IR resembling:

linalg.matmul add="my_dialect.my_fancy_add" mul="my_dialect.my_fancy_mul"
  ins(%A: tensor<!my_fancy_type> ...) outs(...)

This looks like an interesting approach, which would also support some
of our odd use cases, where scalar operations are applied to operands
with different types. At first, I was a bit concerned about the
verbosity in the textual representation of the IR, but this should not
be an issue with attribute value aliases.

Can you elaborate a bit on what you mean with exposed attributes?
This sounds as if one could define an OpInterface with attributes,
which can be specified in the IR for the operation implementing the
interface. I searched through the documentation and grepped a bit
through the sources, but couldn’t find anything in that direction.

Also, it is yet unclear to me how operations should be instantiated
from the attributes. AFAIU, the only option to specify an operation is
to store its name in a string attribute. What is the idiomatic way to
create an operation from a string?

So in summary, the implementation of the solution consists mainly of:

  • Adding a new OpInterface named LinalgNamedOpInterface or
    LinalgFrontEndOpInterface with exposed attributes for all scalar
    operations (i.e., add, mul, sub, etc.) to a new file in
    mlir/include/mlir/Dialect/Linalg/Frontend.

  • Implementing the OpInterface for all Linalg named operations, e.g.,
    by adding appropriate output to linalg-ods-yaml-gen.

  • Modify the helper functions in RegionBuilderHelper, such that
    either the custom operation specified in the respective attribute is
    instantiated or, if the attribute has not been specified and the
    operands are built-in types, the default operation for that type is
    created.

Further issues with the current proposed PR are that it moves the
traditional builder logic to a subset through an enum. This is not
extensible in general. In the very general extensibility case with
control-flow etc, the probably best solution would be just a function
symbol. But we are not there yet.

I am not sure I got this last part correctly. Could you give a short
example of this use case?

All in all, this sounds good and I am willing to work on the proposed
solution. I’d just like to get the ideas straight before starting the
implementation.

Thanks,
Andi

Basically the interface would have methods such as getAddOpName that query the proper attribute is part of the op.
The verifier would ensure that these are present.
The tablegen definition of the op would need to specify this attribute (could be a unique dictionary attr).
Note that you want to avoid putting all attributes on everything so the interface should specify which subset of ops it expects (e.g. NamedAddMulOpInterface or NamedOpInterface<“add_op”, “mul_op”> where add_op/mul_op could well be “arith.max” / “arith.addi”).

You can builder.create(OperationState) and explicitly pass the name in OperationState.
This is how local ops that aren’t registered with a dialect can be created locally.

Yes, and you prob want a “default” that can be elided to avoid increasing verbosity in the common case.
Parsing and printing may be more involved here but it would be a good thing to better separate the auto-generated named ops form the load bearing generic.

I was just thinking that if you want to configure a much more advanced region than a simple fma + a few unary ops, the general case will likely be:

func @some_impl(%a: !my_fancy_type) {
  %b = another_op(%a, %a)
  %c = call @some_other_function_(b) : (!my_fancy_type) -> (!my_fancy_type)
  return %b: !my_fancy_type
}

linalg.my_fancy_op impl="some_impl fun" (%O: memref<?x!my_fancy_type>)

whenever you need to lower to loops you can just inline @some_impl where it is needed.
This would be the more general setup for which we don’t wand special attrs name.

I also wanted to give some background on how we may want to evolve OpDSL.

OpDSL already supports attributes to define strides and dilations:

 %1 = linalg.conv_2d_nhwc_hwcf {
    dilations = dense<1> : tensor<2xi64>,
    strides = dense<1> : tensor<2xi64>}

We may now want to extend the attribute mechanism to functions to reduce the number of operators. For example, we currently have different pooling operators for max, min, unsigned pooling and we would like to have only one with a configurable reduction function. Attributes would allow us to get there:

def pooling_nhwc_sum(
    I=TensorDef(T, S.N, S.OH * S.SH + S.KH * S.DH, S.OW * S.SW + S.KW * S.DW, S.C),
    K=TensorDef(T, S.KH, S.KW, index_dims=[D.kh, D.kw]),
    O=TensorDef(T, S.N, S.OH, S.OW, S.C, output=True),
    fun = ArithFnAttrDef(default=ArithFn.add)):
  O[D.n, D.oh, D.ow, D.c] = Reduce<fun>[D.kh, D.kw](I[D.n, D.oh + D.kh, D.ow + D.kw, D.c])

On the C++ side the reduce fun needs to be set to an enum value (the enum should be tablegen generated):

linalg.pooling_nhwc_sum {fun=linalg.add} ...

These changes alone do not yet support arbitrary types. Yet they may open up opportunities. For example, if the operand types are defined by an unknown_dialect, we could try to create unknown_dialect.add, we could use a type interface to create the add, etc.

Thanks for all the feedback and context. A new revision is available at ⚙ D120027 [mlir][linalg] Support arbitrary element types in named operations via attributes.

@nicolasvasilache I tried to stay as close to your suggestions as possible. However, besides the fact that the new interfaces live in Frontend, they are quite deeply linked with the existing code. LMK if that’s not what you had in mind.

hi,

I am in parallel working on extending with attributes to control what functions shall be used when instantiating an named operation. There is already a revision up to control the casting functions ⚙ D119718 [mlir][OpDSL] Add type function attributes.. And there will be a follow up customize binary and unary operations.

We definitely need to synchronize here since there may be attribute naming conflicts. I will post here once I have the revision for unary and binary ops ready.

For illustration, the revision above lets you control if the op shall use a signed or unsigned cast:

  %0 = linalg.matmul {cast = #linalg<"type_fn cast_unsigned">}
                     ins(%A, %B: tensor<16x8xi16>, tensor<8x32xi64>)
                          outs(%C: tensor<16x32xi32>) -> tensor<16x32xi32>

or

  %0 = linalg.matmul {cast = #linalg<"type_fn cast">}
                     ins(%A, %B: tensor<16x8xi16>, tensor<8x32xi64>)
                          outs(%C: tensor<16x32xi32>) -> tensor<16x32xi32>

Hi,
thanks for letting me know. Casting would indeed have been the next bit to address, so I am glad to see this is being worked on.

Can you elaborate a bit on what you mean by:

And there will be a follow up customize binary and unary operations.

This sounds as if you were planning to provide functionality that is similar to D120027. However, your post suggests that there might only be a clash in the attribute namespace, but no overlap in functionality.

Yes I think there is only a clash in the attribute namespace.

Let me give an example. I plan to add a unary and a binary elementwise operation:

@linalg_structured_op
def elemwise_unary (
    I=TensorDef(T1),
    O=TensorDef(U, output=True),
    fun=UnaryFnAttrDef(default=UnaryFn.exp),
    cast=TypeFnAttrDef(default=TypeFn.cast)):
  O[None] = fun(cast(U, I[None]))

and

@linalg_structured_op
def elemwise_binary (
    lhs=TensorDef(T1),
    rhs=TensorDef(T2),
    O=TensorDef(U, output=True),
    fun=BinaryFnAttrDef(default=BinaryFn.add),
    cast=TypeFnAttrDef(default=TypeFn.cast)):
  O[None] = fun(cast(U, lhs[None]), cast(U, rhs[None]))

The fun attribute can then be set when building an operation or through attributes when parsing the operation:

// elementwise multiplication
%res0 = linalg.elemwise_binary {
    cast =  #linalg<"type_fn cast">,
    fun = #linalg<"binary_fn mul">} 
ins(%0, %1 : tensor<4x8xf32>, tensor<4x8xf32>) outs(%3 : tensor<4x8xf32>)
// elementwise addition
%res0 = linalg.elemwise_binary {
    cast =  #linalg<"type_fn cast">,
    fun = #linalg<"binary_fn add">} 
ins(%0, %1 : tensor<4x8xf32>, tensor<4x8xf32>) outs(%3 : tensor<4x8xf32>)

meaning the first operation would multiply the elements while the second operation adds them. The split in unary and binary operations is not strictly needed but from my perspective makes sense.

I would thus say there is no overlap in functionality. I am mainly working on making the OpDSL operations more configurable via attributes. I still use the region builder to construct the ops using a slightly different signature and the enum. I think the question is now how to combine these two things.

I hope this clarifies a bit what I am working on. If all goes well, I should be able to upload at least a WIP commit by tonight tomorrow morning.

Thanks for the clarifications. While the functionality for linalg.elementwise_unary and linalg.elementwise_binary seems indeed different (specification of a custom function that does not have a fixed meaning beyond its default value) the cast part you mentioned in the previous post is very close (choose a variant of a function with a specific meaning).

Regardless of that, it might make sense to unify the notations for our approaches in order to avoid mixing up different notations. Looking into that, I’m not yet sure where the #linalg<> part is processed. Any pointers are welcome!

Thanks!

The TypeFnEnum and TypeFnAttr are defined in LinalgBase.td (⚙ D119718 [mlir][OpDSL] Add type function attributes.) in the revision I have sent out. Printer and Parser for these enums are auto generated by tablegen. I only added the definition of enum and attribute:

// Define a TypeFn enum matching the OpDSL TypeFn class.
def TypeFn : I32EnumAttr<"TypeFn", "", [
  I32EnumAttrCase<"cast", 0>,
  I32EnumAttrCase<"cast_unsigned", 1>
]> {
  let genSpecializedAttr = 0;
  let cppNamespace = "::mlir::linalg";
}

def TypeFnAttr : EnumAttr<Linalg_Dialect, TypeFn, "type_fn">;