Using GPU type with Standard Ops

A new type GPU_MMAMatrix has recently been added to the GPU dialect by @navdeepkk to represent TensorCore kind of operands.
This is a 2D type with an opaque layout, it can load/store into memref using specific GPU ops and can access a matmul accumulate operation using target dedicated hardware.
In order to be able to use this in any real life kind of scenario we also need to be able to apply element-wise kind of operations on this type.

Here is how the code would look like for a simple matmul op fused with an addf op.

a = load from memref -> opaque martrix
b = load from memref -> opaque martrix
c = matmul_acc a, b, c
c = addf c, d
store c, opaque matrix -> memref

Since they are element-wise operations the layout doesn’t matter and the operation can be well defined without knowing it.

Is there a way to make some of the standard ops element-wise operation accept this new GPU type without having the Standard dialect depend on the GPU dialect?

Note that the concept of TensorCore is not specific to the GPU and we may also want to make this type more generic, I’m not sure if this is something that was consider when working on x86 AMX or other kind of dedicated matrix hardware.

I would love to see more opening up of the MLIR type system, which I think we are now in a position to do with type interfaces. The current state means that many of the operations in the std dialect have to be duplicated all over the place to operate on other shaped types (that can’t truly be ShapedType because it’s a closed class hierarchy). Right now, standard ops only operate on (a subset of) builtin types, but perhaps we could consider turning ShapedType into an interface (I think there was a previous discussion about this, but I can’t find it) and making them operate on that. This would likely require a larger conversation and an RFC though. It has other benefits as well, like letting ops on types from other dialects hook into the verification provided for builtin types. Right now adding verification for these is a pain and a significant disincentive leading to either under-verifying or overloading standard types (at some point in IREE’s past we overloaded memref because defining our own type would have come with some much overhead to retain feature parity)