[RFC] Add GEN dialect for Intel GPUs

bondhugula · February 14, 2024, 11:08pm

I assume you meant mlir-spirv-cpu-runner. (I don’t see any spirv execution test cases under test/mlir-cpu-runner/. ) But I’m trying to understand who generates the SPIRV-binary on this JIT execution path. mlir-spirv-cpu-runner goes via the MLIR GPU dialect AFAICS (populateGpu*ToSPIRVConversionPatterns - @etiotto - correct me if this isn’t accurate), and in the other post, it was clarified that the path would be via the LLVM IR → SPIRV and not via MLIR’s SPIRV dialect → LLVM.

@etiotto My question was how exactly and where you generate the SPIRV binary? It’s clear that’s the idea, but will tools/mlir-spirv-cpu-runner generate it, or will there be a new pass (like the gpu-to-cubin/rocdl for NV/AMD GPUs) that would do that? Does the Intel SPIRV backend provide an API to generate the binary given LLVM IR, and won’t a new pass be needed to get from the GPU dialect to the LLVM IR with SPIRV calls (when not using the MLIR SPIRV dialect)?

fabianmc · February 14, 2024, 11:26pm

No, the current path we have upstream uses LLVM IR for the host and a SPIR-V binary for the GPU.

See this test: gpu-addf32-to-spirv.mlir. The host side is lowered and translated to LLVM IR, while the device side gets serialized to a SPIR-V binary.
The key is that only the GPU module is SPIR-V.

mehdi_amini · February 15, 2024, 2:13am

I have a same observation as the one I made here ; since SPIRV it the intended target I would think that the answer to this question would include positioning with respect to SPIRV.
You are making a comparison to the NVVM dialect, however it is positioned as an alternative to SPIRV, and I don’t quite picture right now the role of this dialect?

I would be concerned if you’re introducing a path to not use the SPIRV dialect and go to LLVM to target SPIRV instead: this would fragile the MLIR infrastructure more than it would serve it IMO,.

bondhugula · February 15, 2024, 2:26am

The test you link to is still similar to the NVGPU/AMDGPU pipeline, where the device binary is generated in an MLIR pass before the host-side code is translated/compiled via LLVM IR. But it’s going through the MLIR SPIRV dialect, unlike what is being proposed here (skipping the MLIR SPIRV dialect).
(Side note: the test specifies the SPIRV target options twice: once in the command and then on the module attribute. I think one of them is redundant.)

fabianmc · February 15, 2024, 3:28pm

You’re correct, my point is that testing would still be possible. For example the SPIR-V target attribute could be modified to have an option that serializes the module through the LLVM-SPIR-V translator. In the end for testing we only need a SPIR-V binary, how do we get there is not as relevant from a testing point of view.

However, I do agree with the concerns of going through LLVM → SPIR-V instead of directly going through SPIR-V. I think better rationale for this decision should be provided.

Hardcode84 · February 15, 2024, 3:52pm

At least for various IDs/dimensions accessors and barriers, corresponding ops already exists in GPU dialect and they already have a portable SPIR-V lowering upstream without involvement of platform-speficic intrinsics, which works on Intel GPUs as well.

github.com

llvm/llvm-project/blob/main/mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp#L616


      
              return success();
            }
          };
          
          //===----------------------------------------------------------------------===//
          // GPU To SPIRV Patterns.
          //===----------------------------------------------------------------------===//
          
          void mlir::populateGPUToSPIRVPatterns(SPIRVTypeConverter &typeConverter,
                                                RewritePatternSet &patterns) {
            patterns.add<
                GPUBarrierConversion, GPUFuncOpConversion, GPUModuleConversion,
                GPUModuleEndConversion, GPUReturnOpConversion, GPUShuffleConversion,
                LaunchConfigConversion<gpu::BlockIdOp, spirv::BuiltIn::WorkgroupId>,
                LaunchConfigConversion<gpu::GridDimOp, spirv::BuiltIn::NumWorkgroups>,
                LaunchConfigConversion<gpu::BlockDimOp, spirv::BuiltIn::WorkgroupSize>,
                LaunchConfigConversion<gpu::ThreadIdOp,
                                       spirv::BuiltIn::LocalInvocationId>,
                LaunchConfigConversion<gpu::GlobalIdOp,
                                       spirv::BuiltIn::GlobalInvocationId>,
                SingleDimLaunchConfigConversion<gpu::SubgroupIdOp,

etiotto · February 15, 2024, 5:56pm

Hi Medhi, thank you for the comments. Let me try to clarify the problem the GEN dialect is attempting to address. There are open source projects (e.g. Triton) that lower their internal MLIR dialects to the LLVM dialect + NVVM/ROCDL (for NVidia/AMD GPUs respectively). To provide support for Intel GPUs, it make sense to follow a similar approach and also generate LLVM IR when the target is an Intel GPU. This approach has several benefits:

allows reusing existing conversion patterns to the LLVM dialect, and adapt these conversions in a similar fashion for Nvidia/AMD and for Intel GPUs via the NVVM/ROCDL and GEN dialects respectively
allows leveraging the LLVM high level optimizer (opt) on the generated LLVM IR prior to converting to a SPIRV binary

Lowering to the SPIRV dialect is one of the possible paths. Projects that already lower to the LLVM dialect (+ vendor extensions) need a similar lightweight solution to extend support for Intel GPUs. I think extending the MLIR ecosystem to support the scenarios I described is useful because provides more implementation choices.

etiotto · February 15, 2024, 6:00pm

Hi Fabian, I responded to a similar comment from Medhi here: [RFC] Add GEN dialect for Intel GPUs - #27 by etiotto. Does that answer look reasonable to you? Please let me if the rationale I provided make sense or if you have further questions/comments.

mehdi_amini · February 15, 2024, 6:37pm

You’re making a comparison to NVVM and AMDGPU, that makes only sense to me if there is a native backend in LLVM for this.

I think we need a larger discussion about the best interest of the MLIR project itself, the long term direction, and the native SPIRV support we intend to build (That likely should involved all the SPIRV stakeholder upstream).
I’m not convinced at this point that what you’re proposing is aligned there actually, but I haven’t heard from all the SPIRV folks so far either!

etiotto · February 15, 2024, 10:17pm

Ok is a good idea to review this proposal in more detail and with a broader audience. As you suggested Medhi, I have added a topic to the agenda here ( [Public] MLIR Open Meeting Agenda - Google Docs ).

ThomasRaoux · February 16, 2024, 2:18am

I’m curious about how the use of LLVM to SPIRV translator would work if that’s what you are planning to us.

My understanding (that may be out of date) is that the translator is not in tree, therefore there isn’t always a version that can take latest LLVM. Is that right? How would it work when there is a breaking change in LLVM?

Is that always possible? I also thought the converter only support a subset of LLVM IR and that applying opt could generate LLVM IR that the converter cannot support. Would that also limit the kind of IR MLIR can generate through this path?

This can be discussed in the open meeting if the answer is too long.

antiagainst · February 21, 2024, 5:18am

Thanks for the proposal. I think there are motivation/plan details that we need to add color to.

But overall, to me, if the GPU driver accepts SPIR-V as the format and we are building a stack mostly with MLIR, it’s better to direct lower to the SPIR-V dialect. It structurally fits–we have pretty much all high level dialects connected with / lowered to the SPIR-V dialect; and the SPIR-V dialect is meant to support all SPIR-V spec aims to support so if something is missing we can/should totally add/fix. So I agree with @mehdi_amini here.

Going through LLVM proper seems quite a detour to me–instead directly lower to the final stage, we are trying to relay/retain certain information starting the LLVM dialect and then through the whole LLVM proper stack and all the passes there and then come back with an out-of-tree translator. That’s lots of plumbing and layers/projects to work through, in addition we also we other problems like versioning, as pointed out by Thomas, and such. I’m not sure what benefits it brings to overweigh.

One might be expecting reusing LLVM optimization passes, but I’m really curious to learn what those are. Based on my understanding we can already do a lot with current progressive high- to low-level lowering to get great performance (via going down SPIR-V too). Not for all cases for sure, but I think we just need to build out missing parts relevant to have a more coherent stack. So curious to see more details here.

Happy to chat more in the open design meeting!

etiotto · February 21, 2024, 3:12pm

Thanks Thomas for your comments.

My understanding (that may be out of date) is that the translator is not in tree, therefore there isn’t always a version that can take latest LLVM. Is that right? How would it work when there is a breaking change in LLVM?

From the Khronos SPIRV-LLVM translator [web page]: “Code on the main branch in this repository is intended to be compatible with the main branch of the llvm project”. So I think in practice the translator going to be in sync with LLVM within a matter of days.

I also thought the converter only support a subset of LLVM IR and that applying opt could generate LLVM IR that the converter cannot support. Would that also limit the kind of IR MLIR can generate through this path?

Our experience with this has been quite positive. We have used the Khronos translator to generate a SPIRV binary from LLVM IR generated by the Intel port of the Triton compiler. In that project the LLVM IR produced by the compiler is then optimized (at opt. level O3) and we have been able to compile and run correctly over 7000 tests, as well as most TorchDynamo benchmarks.

Note that using the khronos translator is one of the possible paths that are available. There is also a [SPIRV backend](https://llvm-project/llvm/lib/Target/SPIRV at main · llvm/llvm-project (github.com)) in LLVM which could be used to compile the input LLVM IR to a SPIRV binary.

rengolin · February 22, 2024, 8:29pm

Notes from the MLIR open meeting today. Please add anything I forgot, or correct me if I’m wrong.

Key points:

There are two SPIRV lowering paths from MLIR:
- SPIRV dialect → SPIRV binary
- LLVM dialect → LLVM IR → SPIRV backend → SPIRV binary
There are different trade-offs, including completeness of SPIRV support in each side, fragmentation of the MLIR lowering, LLVM target support.
The discussion here isn’t about the SPIRV dialect, but about lowering paths through our multiple projects and how to avoid fragmentation and duplication of efforts.
In the end we need to make a pragmatic choice, but that choice needs to be grounded in mutual agreement of what the consequences are and how we’ll deal with it. It needs to be a conscious decision.

Lowering MLIR to SPIRV dialect:

Generating SPIRV in MLIR bypasses the LLVM code generation and avoids down-then-up conversion.
More importantly, (as raised by @antiagainst) not all SPIRV logic can be represented in LLVM IR (strict mode where there are no pointers).
But lowering isn’t complete and Intel has identified missing support (which presumably could be added).
This represent the ethos of MLIR of progressive lowering, but there is no requirement that any particular serialization format must be in MLIR.
SPIRV dialect in MLIR was added much earlier than the LLVM backend, and there were discussions on the LLVM backend upstreaming about the possibility of this very discussion.

Lowering MLIR to LLVM dialect, then SPIRV backend:

Lowers to LLVM dialect calls, to LLVM IR then raised again to SPIRV by the LLVM backend, and to SPIRV binary. This gets picked by the proprietary toolchain.
Existing GPU dialects (NV/AMD) and some external projects (Triton) use LLVM dialect for their lowering, but not SPIRV backend.
- Nvidia lowers MLIR to LLVM dialect → LLVM IR → PTX backend → Proprietary compiler
- AMD lowers MLIR to LLVM dialect → LLVM IR → AMD backend → Binary
Intel wants to lower to LLVM dialect → LLVM IR → SPIRV backend
GEN to LLVM is based on Triton work, which uses a similar model, which is required to add support for Intel GPU to Triton.

Discussion points:

If we add a dialect that lowers GPU code to LLVM/SPIRV instead of SPIRV dialect, we’re restricting the usability of the dialect in-tree, and duplicating the effort to support SPIRV from an MLIR point of view.
If we only have SPIRV support from MLIR, then other efforts like OpenCL and SYCL can’t make use of SPIRV (through LLVM).
The addition of the GEN dialect is not, in itself, a strong endorsement against the SPIRV dialect, just yet another data point on the existing framework of lowering MLIR GPU code to LLVM dialect.
Intel GPU doesn’t have a backend like NV/AMD, but it doesn’t need to, because the existing SPIRV backend works like NV’s PTX (through to a private stack). In a way, the SPIRV backend is Intel GPU’s backend.

The real questions we need to answer:

What are the main reasons why Intel has gone through LLVM to get SPIRV and not straight into SPIRV dialect? Triton support and existing pipelines (NV/AMD) are the key ones, but it’d be good to have a short list of “this-than-that” items.
What is the goal of MLIR GPU lowering? Do we want to get all GPUs (and other devices) to have to go through SPIRV dialect? If not, what is the “accepted” alternative ways, and for what reasons?
Does using the SPIRV backend in LLVM directly correlates with not using the MLIR SPIRV dialect? This may sound odd, but the reason why GEN lower to LLVM IR is not (just?) because the dialect is lacking support, but because of other factors in the stack (Triton, existing work, etc). [see question 1]
What is the value of incomplete dialects in tree because it agrees with our vision versus complete lowering that isn’t quite in the same direction? Bear in mind we’re not talking about something completely wrong, but about a pattern that is the only known pattern in MLIR GPU lowering, but with an additional gotcha of using a backend that MLIR already has (but it’s different).

The questions we do NOT want to answer now:

What is the role of the SPIRV dialect in MLIR? If at all, this should be discussed in a separate thread.
Should we try to add an Intel GPU backend to LLVM to “make the SPIRV issue go away”? This isn’t in the cards because the SPIRV backend is already good enough and it would be a waste of effort right now.

So, actions right now are:

Pause for a breath, have a good night sleep, and only reply to this post in a day or so.
Have read this summary and make sure I didn’t miss anything and add comments below if I did.
Agree on the key points above on the real effects of our decisions on how we lower things, what are the alternatives, why Intel has chosen some and not others and the impact of those choices in the MLIR project going forward (this should probably be a separate RFC).
Make a decision to accept/reject the current GEN dialect proposal with clear recipes on what to do next.

I just want to thank you everyone for staying 2:30 hours on the call today. The commitment of the community to core issues is commendable and very welcome!

etiotto · February 22, 2024, 9:19pm

Thanks Renato for summarizing the key points raised during the MLIR design meeting. Let me add my thanks as well. I appreciate the feedback received during the lively discussion today and your willingness to consider the merits of this proposal.

Jianhui-Li · February 22, 2024, 10:18pm

Thanks all for the good discussions. I put the summary for XeGPU portion in XeGPU RFC.

mehdi_amini · February 23, 2024, 10:33pm

Thanks Renato, excellent summary
And thanks all for the long discussion.

This is touching on the important question to me. I would frame it slightly differently, trying to capture some nuance and elaborate a bit:

The MLIR GPU lowering was designed with the goal of supporting LLVM (for the case where vendors have a native backend in-tree), but also have direct dialect lowering path to SPIRV, and also other possible future virtual ISA (maybe Apple could open their equivalent to SPIRV!) that don’t need the register-allocator and MIR/MC layers provided by the LLVM backend infrastructure.

From this point of view, considering MLIR is built incrementally as the needs arise supporting GEN is one of these needs that usually trigger significant push and progress on this kind of direction: we need these use-cases to motivate and drive these components. This is why starting to build integration of a SPIRV target through LLVM is a departure from the architecture that was envisioned and deserves a “project direction” consideration.

This is not a problem to change our mind and update past plans, I’m perfectly fine with a new mindset with respect to SPIRV (the LLVM SPIRV target didn’t exist back then for example). Instead what matters to me about the choice we make on the direction is that this:

This reminds me also of some initial proposal about a SPIRV target in LLVM that was proposed and didn’t use the LLVM backend infrastructure. Even though this “converter code” (which is still maintained by Khronos maybe?) was fully implemented, LLVM didn’t take it in-tree and it’s only years later that a target implemented using the “regular” intended LLVM target mechanism has been implemented. The “immediately available path” isn’t always the one the community goes with, in favor of going in the long term direction.

In this case, I don’t have a definitive opinion on what is the best way to lower to SPIRV on the long term, but I strongly believe that if we move on with this RFC (and similarly with the extensions for other future SPIRV vendors), then we are de-facto gonna consolidate on building on the LLVM SPIRV path and not build the MLIR native one.
I don’t see a complete and robust MLIR-native one coming off the ground with a dynamic where we create the conditions where the vendors have a better path building on and improving the LLVM SPIRV path.
This may on this point that the authors of the RFC and myself have different assessment (but also certainly different incentive and goals in the discussion here).

Right: the SPIRV dialect is “never” gonna go away: this is not the point of the discussion. But the idea that we would use it for code generation in general in MLIR-based compiler would be basically abandoned.
It is likely that we would continue to support partially some limited form of lowering to SPIRV for the very limited cases “old version of SPIRV” or “Vulkan graphic profile not supported by LLVM”, but that likely wouldn’t get the attention and support needed to make a robust and complete compiler flow for the extent of the compute path, as intended to be used by DL compilers for example. That will stay some sort of second class citizen in the stack here. (And to reiterate: this may be an acceptable direction as well)

stellaraccident · February 25, 2024, 11:34pm

Even second class citizens have some value at various points in time: I have witnessed many, many times where the stricter approach required by the SPIRV dialect lowerings forced us to address major flaws in the higher level code generation approach which a lowering to LLVM partially papers over (and then suffers from reliability or layering problems). That cuts both ways and isn’t enough on its own to justify an investment in perpetuity, but in my view, having it has definitely paid for itself so far. Certainly not enough to have any patent of exclusivity or anything but not without value on its own.

I’m all for being brutally pragmatic here (disclosure: I originally funded and continue to provide for a fair amount of eng support for the direct SPIR-V path): we’re certainly not going to deny one of the primary GPU vendors use of their preferred LLVM backend and interop because of a “namespace collision” and presence of a higher level approach elsewhere in the project. Especially when it is following the pattern of two others that we are strongly invested in. All the better that it is standards adjacent and might have value beyond that.

But I’d also like to not couple that decision at this point in time with a decision to not keep the direct MLIR based SPIR-V lowerings as an invested thing we have for the ML/DL code generation algorithms we have in tree. Ie: If there are real frictions or things being held back there, let’s discuss those on their own. Again, in my view over the years, these have almost always been a leading indicator of a design bug, and looking at them through the simplified paradigm of the SPIR-V lowerings has been the only concrete way we’ve had to resolve the discussions. I don’t want us to immediately say yes to this RFC and start just ignoring or barreling over those points in code review because folks think it implies more than it does (And this doesn’t even get into the discussion about targets and circumstances that can only be reached by a more restrictive profile or less layering – which is entirely orthogonal to this RFC).

I’m perfectly open to setting the existing support adrift eventually and in the right way if things go that way, but just don’t want us making sudden, unplanned moves there. We’ll just need to see where the future takes us. I’ve got opinions but no crystal ball on that count.

mehdi_amini · February 26, 2024, 1:00am

I just remembered something that I mentioned (maybe to answer Whitney in the meeting?) about NVVM/ROCDL dialect existence and why they were different in this discussion, since the question came up in the meeting I felt this previous message of mine was maybe to terse: some of the reasons we need these to target native LLVM backend is that one could use MLIR to take SPIRV as an input, and the convert to LLVM/NVVM and target a NVIDIA GPU (respectively the same for ROCDL/AMDGPU), thus implementing a SPIRV compiler (instead of targeting SPIRV). These exist completely independently from the “how we get to SPIRV”.

This does not change anything to the merit of GEN in itself, just explain why there is no direct correspondence between GEN and NVVM/ROCDL at the moment.

rengolin · February 26, 2024, 9:27am

Thanks Stella, this is what I’m asking here. Working on that kind of integration is hard and takes a lot of effort and time. It rarely comes from a clean starting point, but Intel is willing to to the leg work to make that a better end point. Doing that work downstream will make it really hard for us to cope with both upstream changes and redesign and will very likely derail the effort.

This is our conclusion in the meeting as well. These are separate things that need to be decided for separate reasons (even if at the same time).

Thanks for the clarification. I also see it that way. But to Stella’s point, I don’t think we can have all stacks identical (not that this was your point, just clarifying also) in the first instance (because of the different starting point).

But even later, we need to converge on the same story / design, not necessarily the same implementation details. I’m being vague because I have no idea what this final story will be, or what path we’ll choose to get there.

Topic		Replies	Views
[RFC] Add XeGPU dialect for Intel GPUs MLIR	21	12604	February 22, 2024
[RFC] Proposal for new XeVM dialect MLIR mlir	12	1116	July 3, 2025
Spirv dialect (and SPIR-V target) design decision? MLIR	1	1195	November 17, 2022
[RFC] Upstreaming a proper SPIR-V backend LLVM Dev List Archives	33	894	March 14, 2021
[RFC] Add NV-GPU dialect (HW specific extension of GPU dialect for Nvidia GPUs) MLIR	21	1873	April 15, 2022

[RFC] Add GEN dialect for Intel GPUs

Related topics