A new type GPU_MMAMatrix
has recently been added to the GPU dialect by @navdeepkk to represent TensorCore kind of operands.
This is a 2D type with an opaque layout, it can load/store into memref using specific GPU ops and can access a matmul accumulate operation using target dedicated hardware.
In order to be able to use this in any real life kind of scenario we also need to be able to apply element-wise kind of operations on this type.
Here is how the code would look like for a simple matmul op fused with an addf
op.
a = load from memref -> opaque martrix
b = load from memref -> opaque martrix
c = matmul_acc a, b, c
c = addf c, d
store c, opaque matrix -> memref
Since they are element-wise operations the layout doesn’t matter and the operation can be well defined without knowing it.
Is there a way to make some of the standard ops element-wise operation accept this new GPU type without having the Standard dialect depend on the GPU dialect?
Note that the concept of TensorCore is not specific to the GPU and we may also want to make this type more generic, I’m not sure if this is something that was consider when working on x86 AMX or other kind of dedicated matrix hardware.