EuroLLVM 2023 roundtable - targeting CPUs from ML frameworks

banach-space · May 19, 2023, 4:26pm

Hi everyone,

Firstly, I wanted to say thank you to everyone who participated in our roundtable - I learnt a great deal from you!

Given the interest, I feel that this kind of discussions between frontend and backend developers are very valuable. We should consider organising this more often For those of you who couldn’t join, below is a very brief summary. All in all, the discussion was split into two larger parts:

Scalable vectoristation

This was somewhat complementary to https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/. We focused on the differences between SVE and RVV to better inform the design. @zhanghb97 kindly explained the RVV design and different approaches to dealing with loop remainders when vectorising:

via masking (SVE + RVV),
with scalar loop remainders (SVV + RVV),
via “effective vector length” (RVV).

We’ll definitely need to pay closer attention to these different strategies when implementing scalable vectorisation in MLIR.

Scalable Matrix Extension (SME)

In this part we focused on Arm’s SME and how to best support it in MLIR. We spent a good amount of time discussing various strategies for lowering from linalg.matmul to SME’s outer-product instructions (e.g. smopa). Not easy!

I claimed that it wasn’t yet possible to lower from linalg.matmul to vector.outerproduct, but fortunately I was wrong (thank you for challenging me and kudos to @qcolombet and @matthias-springer for pointing out relevant examples in MLIR and IREE). Apologies for the confusion! I’ve posted a patch so that there’s an in-tree example for this:

⚙ D150457 [mlir][linalg] Add a test for linalg.matmul --> vector.outerproduct.

We also discussed what would/should the backend do when an invalid tile number is specified (this is very specific to SME, so please ask if you’d like more context). I tried this example:

define void @za_write_vg2_horiz_h(i32 %slice, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2) {
  call void @llvm.aarch64.sme.write.hor.vg2.nxv8i16(i32 2, i32 %slice.6, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2)
  ret void
}
declare void @llvm.aarch64.sme.write.hor.vg2.nxv8i16(i32, i32, <vscale x 8 x i16>, <vscale x 8 x i16>)

For i16 there are only 2 SME tiles so only 0 and 1 are valid tile numbers. This what happens with ToT LLVM:

llc -mtriple=aarch64-linux-gnu -mattr=+sme2 -verify-machineinstrs file.ll
LLVM ERROR: Cannot select: intrinsic %llvm.aarch64.sme.write.hor.vg2

So as we expected, it’s entirely up to the frontend to generate correct code.

Next steps

I have sent updates to both RFCs discussed here:

We should definitely continue this discussion in the future. My main priority atm is good support for SME in MLIR and perhaps it would be good to organise some community sync to brainstorm about this a bit more.

As always, please let me know if I missed or misinterpreted anything.

Thank you,
-Andrzej

Topic		Replies	Views
MLIR News, 60th edition (7th Jan 2024) Newsletter llvm-weekly	0	659	January 7, 2024
SME in MLIR status (20/10/2023) MLIR arm , arm64 , mlir	1	639	October 20, 2023
[RFC] Upstreaming a proper SPIR-V backend LLVM Dev List Archives	33	896	March 14, 2021
[RFC] Scalable Vectorisation in Linalg MLIR	11	1991	June 10, 2023
Open MLIR Meeting 6/22/2023: RFC on ArmSME Dialect Announcements	1	657	June 23, 2023

EuroLLVM 2023 roundtable - targeting CPUs from ML frameworks

Scalable vectoristation

Scalable Matrix Extension (SME)

Next steps

Related topics