GSoC proposal: SPIR-V to LLVM IR dialect conversion in MLIR

Barriers are usually model as convergent as on GPU all the SIMD lanes need to hit the barrier together. This is a restriction that most likely doesn’t need to be there on CPU.
If I understand well the real challenge of convergent operations is that they imply synchronization between work items belonging to the same warp. If we ignore this part I don’t think there are other problems.

Makes sense to go for something simple. Does that mean warp level instructions and workgroup synchronization are out of scope?

That’s a very good point! Yes, we should look there for inspirations. :slight_smile:

Interesting! Could you give more pointers regarding these implementations? I’m also curious to learn your thoughts regarding fibers/coroutines. I know that in SwiftShader actually they are using Marl and cooperative scheduling for it. We should sync on this.

I think the required maxComputeWorkGroupInvocations is at least 128? But I get your point here.

I’m leaning towards 2, which is more aligned to the intention of the proposal and is better for the ecosystem. But yes lots of these are out of the scope of the GSOC project per se.

We have a general rule that if there is no clear solution that everybody agrees upon or the solution is too complex to implement, it should not be considered as in the scope. But it can be a stretch goal. Note that the whole mlir-spirv-runner part, which is for really running it, is listed as stretch goal in the original proposal. I think the rule applies here. The “scalar” part of the conversion already contains lots of work I think.

Yes we should definitely sync up. Intel OpenCL CPU implementation was based on this paper:
http://www.intel-vci.uni-saarland.de/uploads/tx_sibibtex/10_01.pdf
I had worked on a similar solution for graphic support based on mesa but I don’t think this got open sourced.
That’s interesting that swiftshader went a different route, I wonder how they deal with large workgroups.

Please note that spv.ControlBarrier is just an example and there are a lot of other SPIR-V instructions which can’t be mapped to LLVM operations. For example Image Instructions from the SPIR-V spec.

This is approach taken by SPIR-V LLVM Translator. It tries to map semantics of SPIR-V instructions to OpenCL builtins. But 1) representation of OpenCL builtins in LLVM IR is not specified and may be different for different back-ends, 2) semantics of OpenCL is different across versions, i.e. OpenCL 1.2 vs OpenCL 2.0.

2 looks more generic(and therefore preferable) to me. If I get it right, result of the conversion will contain SPIR-V specific intrinsics which can be lowered to other target-specific instrisics(nnvm) or calls to library functions implementing semantics of SPIR-V instructions for particular target.
@MaheshRavishankar, by “SPIR-V target”, did you mean a full-blown LLVM back-end or just adding SPIR-V specific intrinsic to LLVM core? Not sure if latter is possible without the former.

Actually I believe that the attribute does not say that the threads are already convergent, just that the instruction is sensitive to the convergence aspect (so the optimizer can’t make this instruction executed by more or less threads than it would).
It works with barrier because this is kind of the property you want: you want all threads that would have hit the barrier in the original code to hit the same barrier instruction after optimizations (before convergent, the “no_duplicate” attribute was introduced in LLVM especially for OpenCL barriers)