Hi everyone,
Firstly, I wanted to say thank you to everyone who participated in our roundtable - I learnt a great deal from you!
Given the interest, I feel that this kind of discussions between frontend and backend developers are very valuable. We should consider organising this more often
For those of you who couldn’t join, below is a very brief summary. All in all, the discussion was split into two larger parts:
Scalable vectoristation
This was somewhat complementary to https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/. We focused on the differences between SVE and RVV to better inform the design. @zhanghb97 kindly explained the RVV design and different approaches to dealing with loop remainders when vectorising:
- via masking (SVE + RVV),
- with scalar loop remainders (SVV + RVV),
- via “effective vector length” (RVV).
We’ll definitely need to pay closer attention to these different strategies when implementing scalable vectorisation in MLIR.
Scalable Matrix Extension (SME)
In this part we focused on Arm’s SME and how to best support it in MLIR. We spent a good amount of time discussing various strategies for lowering from linalg.matmul to SME’s outer-product instructions (e.g. smopa). Not easy!
I claimed that it wasn’t yet possible to lower from linalg.matmul to vector.outerproduct, but fortunately I was wrong (thank you for challenging me and kudos to @qcolombet and @matthias-springer for pointing out relevant examples in MLIR and IREE). Apologies for the confusion! I’ve posted a patch so that there’s an in-tree example for this:
We also discussed what would/should the backend do when an invalid tile number is specified (this is very specific to SME, so please ask if you’d like more context). I tried this example:
define void @za_write_vg2_horiz_h(i32 %slice, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2) {
call void @llvm.aarch64.sme.write.hor.vg2.nxv8i16(i32 2, i32 %slice.6, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2)
ret void
}
declare void @llvm.aarch64.sme.write.hor.vg2.nxv8i16(i32, i32, <vscale x 8 x i16>, <vscale x 8 x i16>)
For i16 there are only 2 SME tiles so only 0 and 1 are valid tile numbers. This what happens with ToT LLVM:
llc -mtriple=aarch64-linux-gnu -mattr=+sme2 -verify-machineinstrs file.ll
LLVM ERROR: Cannot select: intrinsic %llvm.aarch64.sme.write.hor.vg2
So as we expected, it’s entirely up to the frontend to generate correct code.
Next steps
I have sent updates to both RFCs discussed here:
- https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/
- https://discourse.llvm.org/t/rfc-creating-a-armsme-dialect/
We should definitely continue this discussion in the future. My main priority atm is good support for SME in MLIR and perhaps it would be good to organise some community sync to brainstorm about this a bit more.
As always, please let me know if I missed or misinterpreted anything.
Thank you,
-Andrzej