Profiling MLIR Vulkan Runner

I would like to profile MLIR Vulkan Runner. The tool I am looking at is Nsight graphics, as it is capable of profiling Vulkan API.
However, when I try to run nsight to directly profile the vulkan-runner:
./ngfx --activity="GPU Trace" --platform="Linux (x86_64)" --dir="my working dir" --exe="mlir-vulkan-runner" -args="input.mlir;--shared-libs=lib/,lib/;--entry-point-result=void" --wait-seconds=1,
it fails to attach the process.

Launching process...
Preparing to launch...
Launched process: mlir-vulkan-runner (pid: 1930121)
Attempting to automatically connect...
Searching for attachable processes on localhost:49152-49216...
Operation timeout, consider retrying with timeouts disabled.
Launching process... Failed

I guess the reason is mlir-vulkan-runner is essentially not vulkan binary.
Is there a way to just profile the vulkan binary, or maybe the output of mlir-vulkan-runner -print-ir-after-all?

The input file I give to mlir-vulkan-runner is similar to this:

module attributes {
  spv.target_env = #spv.target_env<
    #spv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, {}>
} {
  gpu.module @kernels {
    gpu.func @kernel_add(%arg0 : memref<8xf32>, %arg1 : memref<8xf32>, %arg2 : memref<8xf32>)
      kernel attributes { spv.entry_point_abi = {local_size = dense<[1, 1, 1]>: vector<3xi32> }} {
      %0 = "gpu.block_id"() {dimension = "x"} : () -> index
      %1 = memref.load %arg0[%0] : memref<8xf32>
      %2 = memref.load %arg1[%0] : memref<8xf32>
      %3 = addf %1, %2 : f32 %3, %arg2[%0] : memref<8xf32>

  func @main() {
    %arg0 = memref.alloc() : memref<8xf32>
    %arg1 = memref.alloc() : memref<8xf32>
    %arg2 = memref.alloc() : memref<8xf32>
    %0 = constant 0 : i32
    %1 = constant 1 : i32
    %2 = constant 2 : i32
    %value0 = constant 0.0 : f32
    %value1 = constant 1.1 : f32
    %value2 = constant 2.2 : f32
    %arg3 = memref.cast %arg0 : memref<8xf32> to memref<?xf32>
    %arg4 = memref.cast %arg1 : memref<8xf32> to memref<?xf32>
    %arg5 = memref.cast %arg2 : memref<8xf32> to memref<?xf32>
    call @fillResource1DFloat(%arg3, %value1) : (memref<?xf32>, f32) -> ()
    call @fillResource1DFloat(%arg4, %value2) : (memref<?xf32>, f32) -> ()
    call @fillResource1DFloat(%arg5, %value0) : (memref<?xf32>, f32) -> ()

    %cst1 = constant 1 : index
    %cst8 = constant 8 : index
    gpu.launch_func @kernels::@kernel_add
        blocks in (%cst8, %cst1, %cst1) threads in (%cst1, %cst1, %cst1)
        args(%arg0 : memref<8xf32>, %arg1 : memref<8xf32>, %arg2 : memref<8xf32>)
    %arg6 = memref.cast %arg5 : memref<?xf32> to memref<*xf32>
    call @print_memref_f32(%arg6) : (memref<*xf32>) -> ()
  func private @fillResource1DFloat(%0 : memref<?xf32>, %1 : f32)
  func private @print_memref_f32(%ptr : memref<*xf32>)

@antiagainst is the expert on this (but is out on leave and may not be responsive)

Also @scotttodd who, iirc, had to do something to get the graphics tools to work with these workloads

Thanks for the ping, Stella :slight_smile:

We have this document in IREE describing how to use a few tools, including NVIDIA Nsight Graphics, for profiling Vulkan compute code. That mostly applies to the MLIR Vulkan Runner too, and I’ll quote the introduction from there:

Vulkan supports both graphics and compute, but most tools in the Vulkan ecosystem focus on graphics. As a result, some Vulkan profiling tools expect commands to correspond to a sequence of frames presented to displays via framebuffers. This means additional steps for IREE and other Vulkan applications that solely rely on headless compute. For graphics-focused tools, we need to wrap IREE’s logic inside a dummy rendering loop in order to provide the necessary markers for these tools to perform capture and analysis.

I thought I saw mention of “profiling headless applications” using NVIDIA’s tools in release notes, but all I see at the moment are forum posts from 2019-2020 asking for it to be implemented… YMMV.