While the existing tests under mlir/test/Integration/GPU/CUDA/
use mlir-cpu-runner
to JIT compile and execute MLIR on GPUs, I was trying to see how that same execution could be reproduced by compiling to an object/executable and then executing. This for example works for CPU execution where opt
and llc
could be used and then the output could be assembled and executed. However, there appear to be a few issues trying the same thing for GPUs and I’m listing the steps I followed below:
- Let’s take any one of the example test cases, say:
test/Integration/GPU/CUDA/all-reduce-op.mlir
. Now this works the way that test is set up for JIT execution and executes correctly:
$ mlir-opt ../../test/Integration/GPU/CUDA/all-reduce-op.mlir -gpu-kernel-outlining -pass-pipeline="gpu.module(strip-debuginfo,convert-gpu-to-nvvm,gpu-to-cubin)" -gpu-to-llvm | ../../../build/bin/mlir-cpu-runner -O3 -entry-point-result=void --shared-libs=../../../build/lib/libmlir_runner_utils.so --shared-libs=../../../build/lib/libmlir_cuda_runtime.so --shared-libs=../../../build/lib/libmlir_c_runner_utils.so
Unranked Memref base@ = 0x556873ff3ab0 rank = 3 offset = 0 sizes = [2, 4, 13] strides = [52, 13, 1] data =
[[[5356, 5356, 5356, 5356, 5356, 5356, 5356, 5356, 5356, 5356, 5356, 5356, 5356],
...
- In order to compile the same thing down to a binary and execute, I tried:
mlir-opt ../../test/Integration/GPU/CUDA/all-reduce-op.mlir -gpu-kernel-outlining -pass-pipeline="gpu.module(strip-debuginfo,convert-gpu-to-nvvm,gpu-to-cubin)" -gpu-to-llvm | mlir-translate -mlir-to-llvmir | opt -O3 -S | llc -O3 -march=nvptx64 -o test.ptx
which generates the PTX. However, trying to assemble it:
$ ptxas test.ptx
ptxas test.ptx, line 444; fatal : Parsing error near '-': syntax error
ptxas fatal : Ptx assembly aborted due to errors
At line 444: we have:
.section .debug_pubnames
{
.b32 LpubNames_end0-LpubNames_start0 // Length of Public Names Info
The PTX header shows:
//
// Generated by LLVM NVPTX Back-End
//
.version 3.2
.target sm_20, debug
.address_size 64
...
Wildly guessing that this was an issue with a mismatch in the syntax (hyphens vs underscores), I changed the -
in the names to _
, and this resolves it but:
$ ptxas test.ptx
ptxas test.ptx, line 445; error : Feature 'Defining labels in .section' requires PTX ISA .version 7.0 or later
ptxas test.ptx, line 457; error : Feature 'Defining labels in .section' requires PTX ISA .version 7.0 or later
ptxas test.ptx, line 462; error : Feature 'Defining labels in .section' requires PTX ISA .version 7.0 or later
ptxas test.ptx, line 468; error : Feature 'Defining labels in .section' requires PTX ISA .version 7.0 or later
To get PTX 7.0 which I’d anyway be able to run, I then used:
... | llc -O3 -march=nvptx64 -mcpu=sm_80 | sed -e 's/-Lpu/_Lpu/g' | ptxas - --gpu-name=sm_80 --compile-only -o test
And this finally works: (I guess one could have stripped the debug info as well to avoid this)
$ file test
test: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
It looks like the printed assembly is not being properly supported for parsing? Should this be reported on LLVM or to NVIDIA (albeit a minor issue)? The debug info labels apparently don’t have the right format and they are also being emitted for lower PTX ISA versions that don’t support them.
- More importantly, does anyone know what would be the right way now to link the above object with the MLIR runtime shared libraries? For CPUs, one would just use clang++, g++, or ld on it, but here we get an error like this that’s expected because we are cross-compiling:
clang++ test ../../../build/lib/libmlir_cuda_runtime.so
/usr/bin/ld: test: Relocations in generic ELF (EM: 190)
/usr/bin/ld: test: Relocations in generic ELF (EM: 190)
/usr/bin/ld: test: Relocations in generic ELF (EM: 190)
/usr/bin/ld: test: Relocations in generic ELF (EM: 190)
/usr/bin/ld: test: Relocations in generic ELF (EM: 190)
/usr/bin/ld: test: Relocations in generic ELF (EM: 190)
/usr/bin/ld: test: Relocations in generic ELF (EM: 190)
/usr/bin/ld: test: error adding symbols: file in wrong format
clang: error: linker command failed with exit code 1 (use -v to see invocation)
I couldn’t tell what target options/approach to use.