This part (“preventing us from doing higher-level transformation”) isn’t entirely clear to me. An example here would help. In the past, we’ve added wmma-level ops to the gpu
dialect (although these were specific to NVIDIA GPUs) for the lack of an nvgpu
dialect – these ops as you know use memrefs and GPU dialect-specific types, and it has been so far considered okay to add certain hardware-specific ops the gpu
dialect itself: the key is that these ops still worked on neutral (MLIR builtin) types although their “actions” were GPU-specific (nvidia or AMD). Examples for general reference:
%C = gpu.subgroup_mma_load_matrix %22[%c0, %c0] {leadDimension = 16 : index} : memref<16x16xf32> -> !gpu.mma_matrix<16x16xf32, "COp">
...
%R = gpu.subgroup_mma_compute %A, %B, %C : !gpu.mma_matrix<16x16xf16, "AOp">, !gpu.mma_matrix<16x16xf16, "BOp"> -> !gpu.mma_matrix<16x16xf32, "COp">
PTX in the name would appear to be out of place for a dialect like this. It looks like you want to have a specialized GPU dialect for NVIDIA GPUs: nvgpu
instead?
It’ll be good to have more discussion here before we create it. I don’t think nvptx
is the right name here (being the name of the final LLVM backend) – a big jump in abstraction through GPU → nvvm → LLVM → nvPTX.