In our downstream compiler, we have previously implemented some custom C intrinsics for ARM Scalable Matrix Extension. Since ARM posted the ACLE draft for SME (https://github.com/ARM-software/acle/pull/188/files), we have started refactoring our implementation to support ACLE instead. The first patch has been posted on Phabricator (⚙ D127910 [Clang][AArch64] Add SME C intrinsics for load and store). I would appreciate it if you could review and give us some feedback. We would love to contribute more to the effort if this is the right direction.
Thanks
Sagar