GLSL to SPIR-V dialect frontend project, still open?

Hello,

I am interested in one of the projects listed in the open projects page of MLIR. In particular, the GLSL to SPIR-V dialect frontend project. I was wondering whether I can still give that one a try or not?

I recently started learning about MLIR and currently working through the Toy language tutorial, so I would still need a bit more time to actually start.

Looking at the MLIR repo, it seems this project hasn’t started yet, but I might be wrong.

If the project is still open, I will work on an initial concrete plan for what needs to be done and hopefully refine it later with @antiagainst.

Any pointers for starting points, specially ones that won’t be obvious for a beginner like me would be much appreciated :slight_smile:.

Hey @bob.belcher, it’s great to know that you are interested in the open project. I don’t think anybody is working on it at the moment; so your are certainly very welcome to pick it up!

Out of curiosity, are you particularly interested in this project or actually are you open to other ones? I might have some other ideas for you depending on your interests. You mentioned that you are new to MLIR and working through the tutorials there; that’s actually great resource to get started. On the other side, how familiar are you with GLSL/SPIR-V/GPU? :slight_smile:

Hey @antiagainst,

Thanks for the reply.

I am certainly open for other options. My main goal is to enhance my understanding of GPU (and parallel in general) architectures and optimizations for them.

In a not so distant past (about 2.5 years ago), I worked on 2 somewhat relevant research projects while I was a student. In the first one, the research team I was working with was developing a compiler framework for facilitating building HPC DSLs called AnyDSL. On top of the framework, they developed a language called Impala that is aiming to be a high-level performant language for domains like image processing and ray tracing. My tiny part was to port Nvidia’s CUB library to Impala. While I was there, I implemented most of CUB’s warp and block primitives and some of its device ones. That gave me some familiarity with the PTX ISA and with parallel algorithms.

The other project I had the privilege to learn through was a project to extend the LLVM IR with native instructions to express fork-join parallelism (different model from GPU parallelism but parallelism nonetheless :smiley: ). I did some experiments to build OpenMP and OpenCL backends to which the extended LLVM IR is translated. The results were good but limited to a small scope of what OpenMP and OpenCL have to offer. You can find my dev branch here: llvm-pir/lib/Transforms/PIR at dev-ergawy · Parallel-IR/llvm-pir · GitHub.

A less related effort, but one that’s also related to compilers and programming languages in general. For the past few months, I have been working on a side project through the TAPL book and implementing the languages and type systems discussed in the book in C++. You can find my repo here: GitHub - ergawy/types-and-programming-languages: C++ Implementations of programming languages and type systems studied in "Types and Programming Languages" by Benjamin C. Pierce... Of course, if I start on an MLIR project, this one will have to pushed to the side for a while :smiley:.

Based on that, I am looking forward to your reply with other possible suggestions.

Fantastic! Thanks for all the information; that’s lots of fun projects. :slight_smile:

Another project I can think of is to convert SPIR-V to WebGPU’s WGSL. This will allow us to compile and run workloads in browsers with GPU acceleration in the future, which will be pretty cool. (For ML use cases, the components for the upper layers are mostly there for connecting the dots from high-level programming models like TensorFlow all the way down to SPIR-V, but certainly there are lots of work to be done there too.) Although one thing to point out is that WGSL and WebGPU is still a bit of moving targets at the moment.

I suggested this because I see you are interested in GLSL so probably you might also be interested in this. These two involves more traditional compiler frontend work though (well from SPIR-V to WGSL is kinda the verse given that you are generating source code from IR) so it might not be perfect for your goal of GPU optimization but certainly they should enhance your understanding of GPU architectures I think. So if you are interested, please certainly feel free to pick either up and I’m quite happy to help. Others in the community might have other ideas and cool projects. :wink:

BTW, for SPIR-V, what we have at the moment is mainly for the compute pipeline (for ML purpose). The graphics pipeline is just getting started to see support and @hazem is helping there. I guess you are interested more in the compute side, but GLSL is a programming language for both; just wanted to point out.

Those are standalone big chuck of work. If you are also interested in other smaller tasks, I’m also happy to provide. :slight_smile:

The SPIR-V to WGSL sounds interesting; some shiny new technologies to play with. I am in :). I don’t mind them being moving targets as long as there is something interesting to learn and do.

Over the coming days, I will inspect the obvious sources (SPIR-V, WGSL, and WebGPU specs and their online material) to learn more as much as I can. As I mentioned before, if you have non-obvious suggestions for an outsider please share with me.

Once I have a more concrete idea of what’s to be done, I will write down a draft of the project plan. There is a GSoC project about SPIR-V to LLVM IR conversion (GSoC proposal: SPIR-V to LLVM IR dialect conversion in MLIR) which I think is closely related. So, I guess, the stages of development there might serve as a good starting point for an initial plan for this project.

One remark to verify my understanding. These are some of the directions in which the project should be heading:

a) I think things can be broken down into 2 big chunks:

  1. Define a MLIR dialect for WGSL.
  2. Implement conversion from SPIR-V dialect to WGSL dialect (maybe in the other direction as well: WGSL → SPIR-V?).

SPIR-V dialect → WGSL dialect → WGSL module

b) There might be some sort of an WGSL LLVM backend (existing or to be developed) to which SPIR-V dialect should be emitted. I guess, this would only be possible if SPIR-V goes through LLVM-IR and then the latter get emitted to WGSL through that backend.

SPIR-V dialect → LLVM IR → WGSL module

c) We might also directly emit WGSL from the SPIR-V dilect. That might actually be the same as option (a), however, I am not certain.

SPIR-V dialect → WGSL module

Does any of those paths make sense?

Hey @bob.belcher, actually WGSL is designed to be trivially convertible to SPIR-V, so I don’t think we need a separate dialect for it. By chatting with my colleagues who are more familiar with WGSL side, actually there is a SPIR-V binary to WGSL conversion project in progress: tint - Git at Google. We have serialization for the SPIR-V dialect; we can serialize the SPIR-V dialect IR into SPIR-V blob and then use that. I think what’s left then becomes: 1) build up the SPIR-V dialect to be able to emit SPIR-V that can be consumed in the WebGPU execution environment. This involves setting the proper conversion targets and handling missing ops/patterns in the pipeline, and addressing other unknown issues etc. 2) Figure out a way to run the generated WebGPU-flavored SPIR-V. This is basically the WebGPU API glue. We have various runners in tree and also IREE serve as examples that might be useful as reference here. So this might not be like your thoughts in the above where we need to define a bunch of dialects and convert between them; it’s mostly extending the SPIR-V dialect and connecting the dots between different components. Let me know whether this is still something that is interested to you. I’m happy to provide more details if so.

BTW if WebGPU piqued your interest in general I’m also happy to connect you with WGSL/Tint folks and you can learn more. (they are not on this forum at the moment.)

Thanks a lot for the clarification and further suggestions. The whole landscape of MLIR and its shared boundaries with WebGPU/WGSL (among other things) is quite interesting. I am glad I stumbled upon that world :slight_smile:.

If it’s ok with you, I would prefer filling up the gaps in the SPIR-V dialect and mapping it WebGPU API. I just started learning about MLIR and I would like to work through adding something to it. Even though, I won’t add a new piece to the picture, I will still learn about technologies I am not familiar with.

I just finished reading the SPIR-V white paper and currently getting myself familiar with the WebGPU spec. So the picture is getting a bit clearer.

Any more details are more than welcome.

Sorry for the late reply; I was OOO since last Friday.

That’s certainly fine for me! Thanks again for your interest!

All the MLIR documentation is hosted at https://mlir.llvm.org/. There are lots of good documentation there covering different areas of MLIR. They are certainly worth a read. You can start with overall design docs and get started ones; docs covering technical details of various aspects would be more useful when later developing for MLIR.

SPIR-V side, a few useful resources:

  • The SPIR-V registry: Khronos SPIR-V Registry - The Khronos Group Inc. This is where you find all things related to SPIR-V, including its spec, extensions, and associated repos. Of them, the spec is worth reading, especially the sections before each specific instructions. It should give you overall understanding.
  • SPIR-V dialect’s doc: SPIR-V Dialect - MLIR. This is where you find anything related to MLIR/SPIR-V (without going into code :stuck_out_tongue_winking_eye: ).
  • Vulkan’s environment doc: Vulkan® 1.3.273 - A Specification (with all registered extensions). SPIR-V is just a common binary IL vehicle for sending shader/kernel to the driver. It has basic semantics requirements but depending on the API, there might be more restrictions. This is Vulkan’s additional requirements. Similarly OpenCL also has its requirements.

WebGPU side, in addition to the WebGPU and WGSL spec linked in the above, a few useful resources that I’m aware of:

Hope this helps. :slight_smile:

Awesome, thanks for the links. Will take a dive into them and come back as soon as I have a better idea on what needs to be done (or with more questions :slight_smile: ).

Hi @antiagainst,

Thanks again for the links in your latest reply.

  • I went through the SPIR-V spec, understood the logical and binary structure of a module, different intended use-cases and how extensions are used to support them, and read a random selection of instructions to get familiar with how their binary format and semantics are described.
  • In addition I read the spv dialect’s doc, understood that the-everything-is-an-instruction in SPIR-V’s binary format is good from an execution environment’s perspective but not that good from a compiler’s analysis and transformation perspective. This in turn is one driver for the design of the spv dialect (e.g. decorations are inlined as attributes in the instructions generating the result ids they decorate).
  • I learned a little bit about TableGen but didn’t go into much detail on how ODS and DRR can be used (deferred that until later on when I actually need to write some code). And against your request I had a quick look at the code :stuck_out_tongue:.

Now, I guess it makes sense for me to give it a try at implementing a new op, for example. The spv dialect document mentions an automation script that should make this easy which I think should be the first point of contact to add a new op.

However, before that, do you think it would be worth it to extend the automation script to generate a report of what is missing? I mean querying the ODS files (probably though llvm-tblgen + (for exampe) -print-enum -class=SPV_Op) and querying the machine-readable SPIR-V grammer to generate a list of missing op/enums/types. If you think this would make future additions to the dialect easier then I would be happy to start working on that extension and afterwards we can pick up something missing to add.

Awesome, @bob.belcher!

Yeah just getting basic familiarity and later diving deep when really needed is certainly the preferable way to go.

That is a good starting point. You can find the links and how to use the script from the dialect’s doc.

Thus far ops are basically defined on a per-need basis; but that’s a useful feature to have! It will allow us to get an understanding of missing ops, etc. Please feel free to do it if you are interested. I’d suggest we just start with regular expression and don’t need to have a full-blown TableGen parser to parse the ODS .td files. That’s what we’ve been doing in the existing scripts; it is straightforward and works reasonably well thus far.

Thanks again for the interest! :slight_smile:

Hi @antiagainst,

As you probably noticed I opened a small patch to extend the automation script by printing a report of current level of capability coverage. As mentioned on the review, please feel free to decline it if it doesn’t add that much value :slight_smile:.

Moving on to the more interesting stuff :slight_smile:. Looking at the missing ops, it seems that there is no support for the Addresses capability. If I am not mistaken, the ops enabled by this capability should be important to enable more advanced support for both compute and graphics pipelines. I would like to give starting that effort a try. These are the ops I am referring to (copied straight out from the generated report :stuck_out_tongue:):

  • OpTypeForwardPointer
  • OpCopyMemorySized
  • OpPtrAccessChain
  • OpInBoundsPtrAccessChain
  • OpConvertPtrToU
  • OpConvertUToPtr
  • OpSizeOf
  • OpPtrDiff

Do you think that would be reasonable or are there any impediments that would block starting on this at the moment?

The patch looks awesome! Thanks for taking on it! :smiley:

Addresses and capability/ops depending on it are typically used by OpenCL. But we are seeing them getting supported on Vulkan side too. Having these would allow us to take a step towards supporting OpenCL and Vulkan physical storage buffers so sounds good to me! Please go for it if you are interested. One comment is that you might want to start with ops first to get familiar with the stack. Types are a bit more involved to define. And here for OpTypeForwardPointer specifically it’s for modelling structs containing pointers to itself, etc. A principled modelling in MLIR will need more thoughts and discussion.

Thanks for your reply and for the adivce.

Then I will carefully read the spec to understand the semantics of those ops and try to find/come up with a shader that uses (one or some of) them in order to use as a guiding example.

I just noticed that OpCopyMemory is not supported and it seems like a good predecessor to OpCopyMemroySized. So I started working on it instead, please let me know if, for example, someone else started worked on that op.

Not that I’m aware of. Please feel free to take it. Thanks! :slight_smile:

Hi @antiagainst,

On top of this review: https://reviews.llvm.org/D82384, I am trying to allow the CopyMemory op to accept up to 2 memory access operands to follow the spec closer (SPIR-V Specification). I have an initial patch that parses 2 sets of memory operands fine (not commited since it doesn’t fully work):

diff --git a/mlir/include/mlir/Dialect/SPIRV/SPIRVOps.td b/mlir/include/mlir/Dialect/SPIRV/SPIRVOps.td
index c92af561faf..45e40993ade 100644
--- a/mlir/include/mlir/Dialect/SPIRV/SPIRVOps.td
+++ b/mlir/include/mlir/Dialect/SPIRV/SPIRVOps.td
@@ -215,7 +215,9 @@ def SPV_CopyMemoryOp : SPV_Op<"CopyMemory", []> {
     SPV_AnyPtr:$target,
     SPV_AnyPtr:$source,
     OptionalAttr<SPV_MemoryAccessAttr>:$memory_access,
-    OptionalAttr<I32Attr>:$alignment
+    OptionalAttr<I32Attr>:$alignment,
+    OptionalAttr<SPV_MemoryAccessAttr>:$source_memory_access,
+    OptionalAttr<I32Attr>:$source_alignment
   );

   let results = (outs);
diff --git a/mlir/lib/Dialect/SPIRV/SPIRVOps.cpp b/mlir/lib/Dialect/SPIRV/SPIRVOps.cpp
index 58c96ea7a01..690b28f9bd1 100644
--- a/mlir/lib/Dialect/SPIRV/SPIRVOps.cpp
+++ b/mlir/lib/Dialect/SPIRV/SPIRVOps.cpp
@@ -29,6 +29,7 @@ using namespace mlir;

 // TODO(antiagainst): generate these strings using ODS.
 static constexpr const char kAlignmentAttrName[] = "alignment";
+static constexpr const char kSourceAlignmentAttrName[] = "source_alignment";
 static constexpr const char kBranchWeightAttrName[] = "branch_weights";
 static constexpr const char kCallee[] = "callee";
 static constexpr const char kClusterSize[] = "cluster_size";
@@ -183,6 +184,35 @@ static ParseResult parseMemoryAccessAttributes(OpAsmParser &parser,
   return parser.parseRSquare();
 }

+// TODO Clean this up and merge it with parseMemoryAccessAttributes2 into 1
+// function.
+static ParseResult parseMemoryAccessAttributes2(OpAsmParser &parser,
+                                                OperationState &state) {
+  // Parse an optional list of attributes staring with '['
+  if (parser.parseOptionalLSquare()) {
+    // Nothing to do
+    return success();
+  }
+
+  spirv::MemoryAccess memoryAccessAttr;
+  if (parseEnumStrAttr(memoryAccessAttr, parser, state,
+                       "source_memory_access")) {
+    return failure();
+  }
+
+  if (spirv::bitEnumContains(memoryAccessAttr, spirv::MemoryAccess::Aligned)) {
+    // Parse integer attribute for alignment.
+    Attribute alignmentAttr;
+    Type i32Type = parser.getBuilder().getIntegerType(32);
+    if (parser.parseComma() ||
+        parser.parseAttribute(alignmentAttr, i32Type, kSourceAlignmentAttrName,
+                              state.attributes)) {
+      return failure();
+    }
+  }
+  return parser.parseRSquare();
+}
+
 template <typename MemoryOpTy>
 static void
 printMemoryAccessAttribute(MemoryOpTy memoryOp, OpAsmPrinter &printer,
@@ -192,10 +222,12 @@ printMemoryAccessAttribute(MemoryOpTy memoryOp, OpAsmPrinter &printer,
     elidedAttrs.push_back(spirv::attributeName<spirv::MemoryAccess>());
     printer << " [\"" << stringifyMemoryAccess(*memAccess) << "\"";

-    // Print integer alignment attribute.
-    if (auto alignment = memoryOp.alignment()) {
-      elidedAttrs.push_back(kAlignmentAttrName);
-      printer << ", " << alignment;
+    if (spirv::bitEnumContains(*memAccess, spirv::MemoryAccess::Aligned)) {
+      // Print integer alignment attribute.
+      if (auto alignment = memoryOp.alignment()) {
+        elidedAttrs.push_back(kAlignmentAttrName);
+        printer << ", " << alignment;
+      }
     }
     printer << "]";
   }
@@ -2862,9 +2894,24 @@ static ParseResult parseCopyMemoryOp(OpAsmParser &parser,
       parser.parseOperand(targetPtrInfo) || parser.parseComma() ||
       parseEnumStrAttr(sourceStorageClass, parser) ||
       parser.parseOperand(sourcePtrInfo) ||
-      parseMemoryAccessAttributes(parser, state) ||
-      parser.parseOptionalAttrDict(state.attributes) || parser.parseColon() ||
-      parser.parseType(elementType)) {
+      parseMemoryAccessAttributes(parser, state)) {
+    return failure();
+  }
+
+  if (parser.parseOptionalComma()) {
+    // No comma, hence, no 2nd memory access attributes.
+  } else {
+    // Parse 2nd memory access attributes.
+    if (parseMemoryAccessAttributes2(parser, state)) {
+      return failure();
+    }
+  }
+
+  if (parser.parseColon() || parser.parseType(elementType)) {
+    return failure();
+  }
+
+  if (parser.parseOptionalAttrDict(state.attributes)) {
     return failure();
   }

diff --git a/mlir/test/Dialect/SPIRV/Serialization/memory-ops.mlir b/mlir/test/Dialect/SPIRV/Serialization/memory-ops.mlir
index 25b54c05539..3a21da85f06 100644
--- a/mlir/test/Dialect/SPIRV/Serialization/memory-ops.mlir
+++ b/mlir/test/Dialect/SPIRV/Serialization/memory-ops.mlir
@@ -93,6 +93,9 @@ spv.module Logical GLSL450 requires #spv.vce<v1.0, [Shader], []> {
     // CHECK: spv.CopyMemory "Function" %{{.*}}, "Function" %{{.*}} ["Volatile"] : f32
     spv.CopyMemory "Function" %0, "Function" %1 ["Volatile"] : f32

+    // CHECK: spv.CopyMemory "Function" %{{.*}}, "Function" %{{.*}} ["Volatile"] : f32
+    spv.CopyMemory "Function" %0, "Function" %1 ["Volatile"], ["Volatile"] : f32
+
     spv.Return
   }
 }

With that, the first leg of the round-trip (serialization) works just fine as far as I can tell. The default deserializiation code, however, stands in the way. For example, for the test line in the diff above (spv.CopyMemory "Function" %0, "Function" %1 ["Volatile"], ["Volatile"] : f32), what gets deserialized is the following:

spv.CopyMemory "Function" %0, "Function" %1 ["Volatile"] {alignment = 1 : i32} : f32

So, what is supposed to be interpreted as the second memory access attribute, is instead interpreted as the alignment for the first memory access attribute. I believe the issue is in this section of the deserialization code:

Deserializer::processOp<spirv::CopyMemoryOp>(ArrayRef<uint32_t> words) {
  ...
  if (wordIndex < words.size()) {
    attributes.push_back(opBuilder.getNamedAttr("memory_access", opBuilder.getI32IntegerAttr(words[wordIndex++])));
  }
  if (wordIndex < words.size()) {
    attributes.push_back(opBuilder.getNamedAttr("alignment", opBuilder.getI32IntegerAttr(words[wordIndex++])));
  }
  if (wordIndex < words.size()) {
    attributes.push_back(opBuilder.getNamedAttr("source_memory_access", opBuilder.getI32IntegerAttr(words[wordIndex++])));
  }
  if (wordIndex < words.size()) {
    attributes.push_back(opBuilder.getNamedAttr("source_alignment", opBuilder.getI32IntegerAttr(words[wordIndex++])));
  }
  ...
}

Just wanted to know whether you would suggest:

  1. Somehow customize the deserialization logic to make it smarter?
  2. Model the second memory access operand in another way. Maybe my lack of familiarity with ODS prevents me from finding an easier solution to the problem.
  3. Something else entirely :smiley:.

I ended up customizing the deserialization logic. It would be great if you can have a look at my dev branch here: https://github.com/KareemErgawy/llvm-project/commit/cbf3e0bd6f6203936d5e47a524ddf800ba885e2c.

If you find this to be good approach, I can start a new patch since you already accepted the current one.

Hey @bob.belcher, I think your approach in the above is reasonable and aligned with what I would do!

A bit of more context here: the (de)serialization autogen is there for ease of introducing new ops. It works well when the newly introduced ops are “standard”, but for OpCopyMemory, we have two memory operands so it’s indeed a bit special; so it makes sense to take control there and write the logic manually. We won’t have a lot of such ops; but here you are hitting one. :slight_smile: (The (de)serialization autogen logic is nice but it also creates code bloat that slows down compilation time. We’ve a few patches to address that but I think we can do more on that front.)

1 Like