[RFC] Creating a ArmSME Dialect

If the SME dialect introduces sme.funcop, to model SME functions and the attributes. You could also add some sme.callOps to model transitions between the different SME modes and not-SME modes.

Now that I’ve learned a bit more about the calling convention for SME, I think decorating a func.func with attribute { passthrough = [ "aarch64_pstate_sm_enabled " ] } should be good enough?

But then your customers must know the exact name of the attribute. sme.funcOp might be easier.

That makes sense.

Hi @frank_gao , thanks for the updates!

I am travelling and won’t be able to post until I am back (next week, either during or after EuroLLVM), but …

… just wanted to let you know that we are drafting an RFC specifically about that aspect of SME support. It’s almost ready - hope to publish it next week :crossed_fingers: (sorry about the delay). Once that’s available, your feedback will be greatly appreciated!

-Andrzej

1 Like

I said before that I like sme.funcOp over writing the annotations myself.

Could you think for second for the RFC about adding new SME call Ops. I could have an SME call Op for going from normal state to streaming SVE. Or calling an sme.funcOp from within an sme.funcOp. I believe there was a discussion about these state transitions.

I think in theory, a sme.call would not be necessary as long as functions themselves are declared as sme.func, and the convention would probably follow very closely to this article:
https://llvm.org/docs/AArch64SME.html#compiler-inserted-streaming-mode-changes

The only complication I see is how to determine if another function can be seen as streaming mode compatible. In this instance a sme.call may help, where if we are within a sme.func, a sme.call would be calling a function that is deemed compatible, whereas a func.call would call a non-compatible function and would require a mode change.

More info on the related work on IREE side right now (from Intel / @rengolin): https://groups.google.com/g/iree-discuss/c/SNkFQvtr0Uw/m/PkG4Ga1aBAAJ?utm_medium=email&utm_source=footer
(FYI)

If this is enough of a concern, we can also spill the entirety of ZA tile upon entry to a sme.func from another streaming function, and re-load before returning. Since this is not necessary all the time, we may need to encapsulate it into something like sme.save and sme.restore.

If you add sme.save and sme.restore ops, you could even have a pass that optimizes the necessary save and restore operations.

It may give better results in the end than assuming that all sme.funcOp have the LLVM attributes.

The RFC: [RFC] Supporting Armv9 Scalable Matrix Extension (SME) Streaming SVE (SSVE) mode in MLIR / IREE · Issue #13556 · openxla/iree · GitHub

1 Like

Hi Frank,

Sorry about the delay in responding.

This discussion has already helped us a lot to better understand various SME related challenges, thank you!

I am by no means against an “Arm SME” dialect, but my overall feeling is that this RFC/proposal is trying to address too many questions at once. It’s already quite long, and we keep adding new ideas/challenges :slight_smile: I suggest that we split this discussion into two:

  • targeting SME from MLIR (e.g. how to get from Linalg to SME),
  • “Arm SME” dialect design/proposal (this should be an implementation detail informed by the above).

The former will require much more than just an “Arm SME” dialect. For example, the following discuss some key aspects of the SME ISA, but don’t require a dialect:

Both these proposals get us closer to better understanding how and what code would be generated for SME (and also for SSVE, which is equally important for us).

For the tile allocation, I am really hoping that we can leverage the existing abstractions in MLIR, e.g. Linalg and Vector dialects, tiling and vectorisation logic, etc. IMHO, we should explore this more.

In general, I feel that we are making good progress, but it would help if we switched to more focused discussions. If we wish to continue brainstorming about the overall design, then could we either create a new thread or rename this one (as e.g. “Supporting SME in MLIR”)?

-Andrzej

Btw, we discussed SME extensively last week during EuroLLVM and we agreed that this would be fine for our first prototype, but we should be open/prepared to extending this in the future:

I’ll try to post more notes in the roundtable thread: https://discourse.llvm.org/t/eurollvm-2023-roundtable-targeting-cpus-from-ml-frameworks/ (sorry, haven’t had the time yet).

1 Like

Thanks for the confirmation! Here’s the first implementation of the dialect itself:
https://reviews.llvm.org/D152080

I’m still working on the transformation pass that turns this into llvm intrinsics…

Definitely!

There is no agreement to introduce one.

We are working on an implementation that would leverage the Vector dialect instead. We should be able to share an example soon.

Should we have a quick discussion about this in one of the upcoming MLIR ODM meetings? I have the impression that we are all working in the same direction but we are not on the same page. So far we have focused the discussion on introducing or not introducing a dialect but we haven’t elaborated on why it’s needed/what we plan to do with it. I think a high-bandwidth discussion on this matter should help us decide if we should introduce a dialect right now or wait, build more expertise and introduce it later.

If this makes sense to you, please, let me know what you think and your availability for the next upcoming MLIR ODM meetings, esp. @banach-space and @frank_gao.

Thanks!

2 Likes

Thanks for suggesting this - makes a lot of sense! I am available on June 15th and 29th. I could compile a few introductory slides on SME and the patches upstreamed so far, if that helps.

-Andrzej

Great idea, thanks for the suggestion.

I am also available on these dates.

Great! I added an entry to the public agenda: [Public] MLIR Open Meeting Agenda - Google Docs

We should check with @mehdi_amini now, I guess.

Here’s the patch that I was referring to (kudos to @c-rhodes for working on it):

Our ultimate goal with that is to lower linalg.fill 0 %arg0 to SME’s zero via the Vector dialect:

  • linalg.fillvector.transfer_write → SME LLVM IR intrinsics

The first step requires [RFC] Scalable Vectorisation in Linalg , so that patch only implements lowering from Vector.

So far we didn’t have to introduce any dedicated SME Ops and that’s the approach that I’d prefer for us to take. This way we can leverage the Linalg and Vector dialects and all the great work that has gone it to over the years.

Lets use the upcoming MLIR ODM meeting on SME to decide whether we should lean more towards re-using the Vector dialect or to introduce the ArmSME dialect. We won’t be merging this patch in the meantime (unless there’s a clear support for this approach).

@frank_gao , would June 22nd also work? I managed to clear my calendar (and June 15th is not available).

-Andrzej

That should work too.