Breaking change to gpu.func to spv.func lowering

The current lowering from gpu.func to spv.func automatically adds the spv.interface_var_abi that are later used by the LowerABIAttrsPass to set the necessary binding and descriptor sets. It encoded a default ABI of setting all descriptor_sets to 0 and binding being the argument index. Instead of hard-coding the ABI, it is better to allow the client of the pass to set the ABI using the spv.interface_var_abi attribute before invoking the gpu.func to spv.func conversion pattern.

The patch implements this change. Its a breaking change for users that are not in tree. Posting it here for wider visibility. Will wait for a few days to commit that one.

I don’t understand enough here but this seems like you’re creeping up some low-level details of the SPIR-V modeling into the GPU dialect, and these details seems fairly opaque to me. I would expect that when you have the gpu dialect, there are enough informations that are structured to map everything as expected.

For instance these attributes in the test mlir/test/mlir-vulkan-runner/time.mlir seem completely redundant.

Can you provide more information on all this?

Sure! Vulkan uses a binding model for resources. Each resource is represented as a descriptor and bound to a “slot” in a descriptor set. Those resources are accessed in the kernel via these (set, binding) numbers. Specifically, a buffer in SPIR-V is a global variable with a (set, binding) decoration:

spv.globalVariable bind(0, 1) : !spv.ptr<!spv.struct<...>>

These (set, binding) assignment is a contract between the runtime and the kernel for resources; they must match. Note that the available (set, binding) pairs can be quite limited: Vulkan spec itself only guarantees at least 4 sets and each with 4 storage buffers. (See “Table 46. Required Limits” of the spec.) This is especially true for mobile world where a large portion of it sticks with the required minimum. So how to assign this should be carefully planned according to the target hardware. Naively just assigning the numbers sequentially may not work; this is better to be specified by the runtime. For example, IREE would like to control how buffers are organized and thus specify this. We’d like to accommodate such use cases by making it more general. Of course, I think it also makes sense to have a “default” way that one just get the previous sequential assignment scheme.

Yes, I think will need a utility function to add default ABI if needed. Will update the patch with that. THen it is not a breaking change :slight_smile: (hopefully)

@antiagainst provided more details (thanks!), but the spv.interface_var_abi attribute is not bubbling up to the GPU dialect itself. The attribute is expected to be put there before lowering the gpu.func to spv.func. So the expectation is that before lowering gpu.func to spv.func the client needs to add more information w.r.t to the ABI needed for the lowering, info that is not expressible in GPU dialect itself. So its only relevant to those conversion frameworks that are interested in lowering to SPIR-V.

I am not sure, everything you wrote after this sentence isn’t reassuring to me.

The gpu dialect is at the moment self-contained and well defined: in the example I linked above there is an unambiguous mapping from the gpu.kernel_launch to the gpu kernel arguments. Any attributes that is added there seems unnecessary from the gpu dialect point of view.
I understand that there is a need for an ABI agreement during lowering, but since this is an ABI contract between the caller and the callee, it isn’t clear to me why this needs to be expressed as attributes.

In the example in your revision (mlir/test/mlir-vulkan-runner/time.mlir), can I just change the attributes to and shuffle the bindings assignments on the arguments (without changing the IR) and have the runner test that would still just works?

I dont see this as related to GPU dialect at all. This is past GPU dialect in the lowering.

I am not sure I fully follow what the question is here. Binding/descriptor sets are ABI between the SPIR-V kernel and the vulkan driver. A change in the ABI requires the runners “host-side” to be changed as well. The need for the spv.interface_var_abi attribute is because the ABI for the Vulkan/SPIR-V and CUDA are very different and what the GPU dialect implements is very similar to CUDA. In CUDA the ABI is defined by kernel arguments. In Vulkan/SPIR-V it is through binding and descriptor set. It is a fairly arbitrary choice to always set (binding 0, descriptor_set 0) for argument 0, (binding 1, descriptor set 0) for argument 1, etc. It was implemented as an initial solution but, in general, the mapping from binding, descriptor that corresponds to an argument (which becomes a spv.GlobalVariable eventually) should be not hard-coded within the pattern that lowers from gpu.func to spv.func. Instead it should be driven by the client that is lowering both the gpu.func and the gpu.launch_func, for them to be consistent (similar to how the arguments of the same type need to passed in the same order to a CUDA kernel when launching it). This change is to move away from the lowering hard-coding the descriptor set bindings.

The SPIR-V lowering utilities aims to support multiple incoming paths instead of just the GPU dialect. The interface_var_abi attribute provides a way for anybody that want to use the kernel CodeGen part and integrate with his/her own runtime and have ABI consistency. The GPU dialect is self-contained yes but it’s more for CUDA; its host side modelling and how it handles ABI does not work natively for Vulkan and SPIR-V. As previously explained, the set/binding slots can be a scarce resource that one must carefully plan according to target hardware. In order to do that, cases can exist that multiple SPIR-V global variables are given the same set and binding number but with different types, which effectively means multiple resources from SPIR-V’s perspective are backed by the same underlying buffer in Vulkan runtime. These are things out of SPIR-V lowering’s concern and control and should not be arbitrarily dictated by the SPIR-V lowering.

Isn’t the example I mentioned expressed in the GPU dialect?

My point is that there is already a relationship between a launch and a kernel in the GPU dialect. And I see what you’re adding a side-mechanism that overrides it, but only on one side of the contract: basically in the GPU dialect I should be allowed to re-order the arguments of the kernel and the launch, I expect your feature to not change this.

My point is that the IR contains both the host and the device: the ABI expressed here should be handled correctly then.

If I take LLVM as an example, any ABI tweak is expressed on the function (not as opaque random attributes though) and the codegen for both the call and the callee are expected to obey by this ABI.

Maybe a cause of confusion here is that the lowering from GPU dialects device side maps well to SPIR-V dialect and code-generation. The host side doesnt map to vulkan.

To repeat a point above, in mlir-vulkan-runner for example (which starts with GPU dialect), when it is lowering the gpu.func to spv.func it will have to set additional ABI related information on the gpu.func before lowering to spv.func, and the host side launch needs to make sure it is using the same binding and descriptor sets. This is because the GPU dialects ABI is not sufficient to express Vulkan/SPIR-V ABI. If you have a suggestion to make that better, that would be great.

So maybe the cause of confusion here is that there is no native representation of the “vulkan” side to lower the GPU dialects host-side functions (like say a Vulkan dialect). Then you could probably see the whole picture w.r.t to ABI. As an example, in IREE, the HAL (Hardware Abstraction Layer) for Vulkan defines an interface here and IREEs codegen ensures that the contract expressed there gets propagated to the generated spv.module here

My point is that there is already a relationship between a launch and a kernel in the GPU dialect. And I see what you’re adding a side-mechanism that overrides it, but only on one side of the contract:

Note that spv.interface_var_abi is not a side-mechanism to override a “proper” GPU ABI modelling; it is providing necessary information for SPIR-V lowering. As said in the above, the GPU dialect does not model the Vulkan/SPIR-V’s ABI at all at the moment; what it has is more akin to CUDA. The information of which set and binding number to assign is missing in the current GPU dialect and its host modelling. It “works” previously because we weren’t pushing forward on this front and decided just to use a convention for starter, but now we need to handle the ABI correctly to consider target constraints. To have this handled properly in the GPU dialect, a proper modelling of the host side should exist. Currently we only have a launch op; I don’t quite see the benefits of pushing forward immediately there.

It does not map perfectly, but there is a lowering right now or the runner wouldn’t work: you would break this.
I would also expect that the work on the interaction between SPIR-V and the GPU dialect would go into reducing the impedance mismatch and finding a more principled solution: I’m afraid that the route you’re proposing here is not.

Right: so either the attributes you are added are well integrated and the host side will just be able to take them into account, or their aren’t. That was the sense of my question earlier.

I actually would like that the work on SPIR-V/Vulkan does not diverge from the GPU dialect too much.
I think there needs to be more investment into making sure that whatever exist for SPIR-V connects well with the higher level of the stack (i.e. the GPU dialect) and not in very ad-hoc ways like here.

I actually would like that the work on SPIR-V/Vulkan does not diverge from the GPU dialect too much.

IMHO the abstractions are there to model some existing APIs; not the other way around. The existing GPU dialect is one way of modelling things (and we can certainly iterate there); I also see great benefits to make the SPIR-V utilities general to serve other users even if they don’t follow the GPU dialect’s way of doing things. :slight_smile:

I think there needs to be more investment into making sure that whatever exist for SPIR-V connects well with the higher level of the stack (i.e. the GPU dialect) and not in very ad-hoc ways like here.

I guess it’s a do-learn-iterate loop; we discover some of the issues, find a solution, and then improve around it. I’m actually in talk with Stephan to collaborate on host side modelling. :slight_smile: It’s tricky because of the differences between APIs but hopefully we can have a better approach. With that said, the work in GPU dialect does not invalidate the abstractions in SPIR-V layer as they are different levels.

Fly on the wall perspective from lurking on this thread to see if it resolves: it seems that there are two actual levels of abstraction at play here and one dialect/set-of-ops (GPU) that may be creeping too far beyond the original intent. Something should model the actual lower level ABI to the extent that it is valuable to interact with that level (which, the assertion here - which I agree with - is that it is). I think the point is completely valid if that something is not the literal gpu.func/gpu.kernel_launch ops. If there were a lower-level of abstraction, specific to the ABI which we assert is a more direct match to how Vulkan reasons about these things, it seems that it would be possible to do a naive lowering from the gpu.func/gpu.kernel_launch level of abstraction to that by assuming a 1:1 mapping (which, iiuc, is the simplifying assumption that the vulkan-runner is making and that the more complicated use cases can no longer assume). I may be getting some of the terminology wrong.

If you were to name that lower level of abstraction and structure it how it naturally fits, what would it be called/look like (leaving aside things like what dialect/code base it lives in)?

There is a contradiction in your sentence: the fundamental definition word “abstraction” is referring to me to not map to an existing API but in fact… abstracting away from it!

I see the problem with the work on SPIR-V in MLIR right now as lacking abstractions in fact.

I am very wary of taking shortcuts here. The “serve other users” motivation should not trump the design of the project and the general ecosystem fit. I have the feeling that you’re focused on mapping 1-1 the Vulkan/SPIR-V model without consideration with the overall fit and I am not OK with this approach right now.
In general “magic attributes” that are not first class but are “load bearing” (cannot be ignored for correctness) is a problem and indicates to me areas that require more thinking.

If the GPU dialect has fundamental limitation, then please work to improve this instead of just taken this for granted and bypassing the layers.

Fundamentally here:

   gpu.func @kernel_add(%arg0 : memref<8xf32>,
                                         %arg1 : memref<8xf32>)
   gpu.launch_func @kernel_add(%arg0, %arg1)

Swapping arguments should be fine when one control both the call site and the callee. The attributes that are added are side-data that are not fitting there.

   gpu.func @kernel_add(%arg1 : memref<8xf32>, %arg0 : memref<8xf32>)
   gpu.launch_func @kernel_add(%arg1, %arg0)

This is the key point to me, and I’d personally prefer if we could direct ourselves at this technical critique and avoid making sweeping statements about SPIR-V, the ecosystem, etc.

Am I wrong in the following statement? IREE, for reasons which for the purpose of the discussion can be taken to be “target specific”, wants to have a launch ABI that operates on a different level of granularity versus a 1:1 mapping of buffer-level arguments. They think there are good reasons to do this that may or may not be general to anything else (or even a good idea), and are trying to figure out how to design that with respect to the current layering. We could call this “packed arguments” or whatever, but at this level of discussion, it represents one of a hundred practical considerations that come up when designing interfaces at this level. There’s some chance that the next person who comes along and notices a performance or scalability issue at this junction may want to also consider a different way of packing launch arguments, which may or may not be related.

(When I first saw this thread a few days ago, like Mehdi, I also noted the “magic attribute” solution, which should almost always generate a design query, but I am not deep enough in this particular part of the space to have a specific critique and stayed silent)

I don’t know what the right design answer here is, but, imo, the aim is a valid thing to want to try and has very little to do with Vulkan/SPIR-V specifically (afaik, even programming a higher level of abstraction like CUDA can have benefits from breaking the correspondence, although there tends to not be such performance cliffs involved that prompt people to reach for such solutions for those classes of devices).

But just adding an attribute that changes the entire interpretation of the current functions/launches seems like a bit too much of a shortcut to codify upstream. I’d also like to avoid this thread heading into the space where we just fork and put something in locally (beyond for a quick poc or the like).

I’m wondering if there are any others who can help brainstorm a more appropriate layering? @_sean_silva, @herhut, @ftynse - who I know work in this area and could form opinions.

Haven’t read this thread fully in depth, but from looking at the code, it seems to me like the desire here is just to be able to customize the ABI used to lower gpu.func? The way to do this proposed in this patch is for some earlier pass to annotate an attribute which then the gpu.func -> spv.func lowering knows about and respects.

It seems better to just have createGPUFuncToSPVFuncPass(...) to take an argument, which is some sort of “target-specific” C++ interface that knows the ABI. This C++ interface could be shared across all conversions lowering to spv.func. Perhaps the interface would have a virtual method getABIInfo(ArrayRef<Type>) -> vector<SpirvABIInfo>, which is called with the list of types on the entry block argument or something like that.

This is the point that I am trying to put across. In the GPU dialect it is totally fine to swap the arguments around. Nothing related to SPIR-V stops/prevents that. But when lowering GPU dialect to SPIR-V you have the “freeze” the ABI. That is because there is no direct modeling of Vulkan in MLIR core. So a vulkan runner when consuming input from GPU dialect, when lowering to SPIR-V has to have frozen the ABI. The attributes are mainly for convenience. You can do all the signature conversion required to move from gpu.func signature to the signature that SPIR-V spec expects here (which it did at one point of time, and that was not very easy maneuver). The attributes make the lowering much more simple and the ABI expressed by attributes are “implemented” later before serialization.

This is not the only instance of this. The loop.parallel to gpu.launch lowering also uses a “mapping” attribute to codify information about how to lower the loop to gpu.launch. So if this is an issue, then happy to explore other solutions. Personally I dont see an issue with attributes. They are a way to codify fine-grained information needed to lower from one dialect to another.

I am not really sure why this better. That information is explicitly encoded in IR as an attribute or is implemented using a C++ class seems to be the same. I see an issue of having such a pass. There might be many gpu.func that need to be lowered to spv.func. Passing separate ABI information for each conversion using a C++ interface does not seem to be very usable. WIth an attribute, every gpu.func can make its own decision.

This thread has veered off from its original intent. I would really like to get more input from people who are lowering to SPIR-V dialect (like PLaidML folks who also use GPU dialect) as to input/experience with current implementation. We have been evolving this with IREE as a driver. The vulkan runner uses it as well. The attributes have really nothing to do with GPU dialect itself, and its host-device ABI.

And if you didn’t notice at the time, I raised similar concerns, but I was OK with this for the sole purpose of testing and experimenting convenience.

Right: so the freezing is in the lowering itself, not before. And that means that the input to mlir-vulkan-runner in the GPU dialect, being before the lowering to SPIR-V should not have such frozen ABI, otherwise it does not exercise the lowering.

Yup, it’s basically the same. Either:

  1. you run some function f on each gpu.func and then materialize its return value in an attribute that is later read by GPUFuncToSPVFuncPass, or
  2. you just have GPUFuncToSPVFuncPass call f itself and directly use the information.

It seems strictly simpler to do the latter, unless there is some complex non-local factor determining the annotation which requires global considerations (but the use of the term “ABI” suggests that that’s not the case). If there is some pressing testing use case where you would like to embed the specific lowering into IR, then could have a function f which considers some test-only attribute.