EuroLLVM 2023 roundtable - targeting CPUs from ML frameworks

Hi everyone,

I would like to submit a Roundtable proposal for EuroLLVM '23. This is just small heads-up to see whether there would be any interest and whether there’s something specific that you’d like to see discussed.

Title: Targeting CPUs from ML frameworks/compilers

Some terminology to avoid confusion (these terms can mean different things depending on context):

  • frontend - consumes high level representation and generates LLVM IR,
  • backend - consumes LLVM IR and generates machine code for the specified CPU.

There’s quite a few of us using MLIR and MLIR-based compilers (e.g. IREE/OpenXLA) that target various CPU backends in LLVM. There are other frontends, outside of LLVM, as well. In fact, Elen Kalda will be giving a presentation based on her experience with TVM. Is there anything that we can do to integrate both frontend and backend technologies better?

I am particularly interested in vectorisation and various CPU extensions, e.g. Arm’s Scalable Matrix Extension [1] as well as scalable vectors in general. I work for Arm, hence referring to SME, but this roundtable is meant to cover all CPUs that folks care about. Can you think of any other topics that you’d like to discuss?

Ideally, we would gather together backend and frontend developers and discuss what the ideal interface should be or what might be the barriers to enabling new CPU extensions. How does it sound?

Hope to see you in Glasgow!


[1] The Scalable Matrix Extension (SME)


Hi everyone,

Firstly, I wanted to say thank you to everyone who participated in our roundtable - I learnt a great deal from you!

Given the interest, I feel that this kind of discussions between frontend and backend developers are very valuable. We should consider organising this more often :slight_smile: For those of you who couldn’t join, below is a very brief summary. All in all, the discussion was split into two larger parts:

Scalable vectoristation

This was somewhat complementary to We focused on the differences between SVE and RVV to better inform the design. @zhanghb97 kindly explained the RVV design and different approaches to dealing with loop remainders when vectorising:

  • via masking (SVE + RVV),
  • with scalar loop remainders (SVV + RVV),
  • via “effective vector length” (RVV).

We’ll definitely need to pay closer attention to these different strategies when implementing scalable vectorisation in MLIR.

Scalable Matrix Extension (SME)

In this part we focused on Arm’s SME and how to best support it in MLIR. We spent a good amount of time discussing various strategies for lowering from linalg.matmul to SME’s outer-product instructions (e.g. smopa). Not easy!

I claimed that it wasn’t yet possible to lower from linalg.matmul to vector.outerproduct, but fortunately I was wrong (thank you for challenging me and kudos to @qcolombet and @matthias-springer for pointing out relevant examples in MLIR and IREE). Apologies for the confusion! I’ve posted a patch so that there’s an in-tree example for this:

We also discussed what would/should the backend do when an invalid tile number is specified (this is very specific to SME, so please ask if you’d like more context). I tried this example:

define void @za_write_vg2_horiz_h(i32 %slice, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2) {
  call void @llvm.aarch64.sme.write.hor.vg2.nxv8i16(i32 2, i32 %slice.6, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2)
  ret void
declare void @llvm.aarch64.sme.write.hor.vg2.nxv8i16(i32, i32, <vscale x 8 x i16>, <vscale x 8 x i16>)

For i16 there are only 2 SME tiles so only 0 and 1 are valid tile numbers. This what happens with ToT LLVM:

llc -mtriple=aarch64-linux-gnu -mattr=+sme2 -verify-machineinstrs file.ll
LLVM ERROR: Cannot select: intrinsic %llvm.aarch64.sme.write.hor.vg2

So as we expected, it’s entirely up to the frontend to generate correct code.

Next steps

I have sent updates to both RFCs discussed here:

We should definitely continue this discussion in the future. My main priority atm is good support for SME in MLIR and perhaps it would be good to organise some community sync to brainstorm about this a bit more.

As always, please let me know if I missed or misinterpreted anything.

Thank you,