Injecting assembly into MLIR

Hi all,
In order to get quickly to the bottom of GEMM performance, I would like to inject some assembly directly in MLIR. I would imagine some opaque operation that calls into a function providing the pointers to the memory regions. Is there this option in MLIR?

Thanks,
Giuseppe

cc: @lorenzo_chelini @nicolasvasilache @stevenvar

I added an InlineASMOp a while back.
It is used e.g. here: https://sourcegraph.com/github.com/google/iree/-/blob/iree/compiler/Codegen/LLVMCPU/VectorContractToAArch64InlineAsmOp.cpp?L106

But I think it is definitely a footgun and I do not have expertise using it myself.
Examples would be most welcome though :slight_smile:

Hi @nicolasvasilache ,
Very cool, thanks!

I totally agree about it being a footgun, but I am looking for a quick way to prototype for inner kernel for GEMM (then we can abstract high level transformation at a later stage).

Thank you once more,
Giuseppe

Quick heads up in case this is relevant, I’ve been looking a little deeper into vector.shape_cast and vector.transpose.
There are some inefficiencies that I am looking into ironing out.

HI Nicolas,
Thanks for the heads up! I would be also curious to understand why without transposition it goes slower :slight_smile:

Anyway I am mostly focused on the inner kernel for now (and probably for the next month).

Early experiments today showed that I could get to the 80% of the peak (and about the same performance of ACL-without prefetching) if I can improve the inner-kernel.

I will write a more detailed post about this next week.

Have a nice week-end,
Giuseppe