I am compiling MLIR like this:
cmake -G Ninja ../llvm \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=clang-11 \
-DCMAKE_CXX_COMPILER=clang++-11 \
-DLLVM_ENABLE_PROJECTS=mlir \
-DLLVM_BUILD_EXAMPLES=ON \
-DLLVM_TARGETS_TO_BUILD="X86;NVPTX;AMDGPU" \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_ENABLE_LLD=ON \
-DLLVM_INCLUDE_EXAMPLES=ON \
-DMLIR_ENABLE_CUDA_RUNNER=ON \
-DMLIR_ENABLE_CUDA_CONVERSIONS=ON \
-DMLIR_INCLUDE_TESTS=ON \
-DMLIR_INCLUDE_INTEGRATION_TESTS=ON
There was no error during the compilation. However, during the test, it fails on all of the codes in llvm-project/mlir/test/Integration/GPU/CUDA
. For example –
FAIL: MLIR :: Integration/GPU/CUDA/all-reduce-and.mlir (604 of 1035)
******************** TEST 'MLIR :: Integration/GPU/CUDA/all-reduce-and.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1'; $HOME/opt/llvm-project/build/bin/mlir-opt $HOME/opt/llvm-project/mlir/test/Integration/GPU/CUDA/all-reduce-and.mlir -gpu-kernel-outlining -pass-pipeline='gpu.module(strip-debuginfo,convert-gpu-to-nvvm,gpu-to-cubin)' -gpu-to-llvm | $HOME/opt/llvm-project/build/bin/mlir-cpu-runner --shared-libs=$HOME/opt/llvm-project/build/lib/libmlir_cuda_runtime.so --shared-libs=$HOME/opt/llvm-project/build/lib/libmlir_runner_utils.so --entry-point-result=void | $HOME/opt/llvm-project/build/bin/FileCheck $HOME/opt/llvm-project/mlir/test/Integration/GPU/CUDA/all-reduce-and.mlir
--
Exit Code: 1
Command Output (stderr):
--
'cuStreamSynchronize(stream)' failed with 'CUDA_ERROR_ILLEGAL_ADDRESS'
'cuStreamDestroy(stream)' failed with 'CUDA_ERROR_ILLEGAL_ADDRESS'
'cuModuleUnload(module)' failed with 'CUDA_ERROR_ILLEGAL_ADDRESS'
$HOME/opt/llvm-project/mlir/test/Integration/GPU/CUDA/all-reduce-and.mlir:64:12: error: CHECK: expected string not found in input
// CHECK: [0, 2]
^
<stdin>:1:1: note: scanning from here
Unranked Memref base@ = 0x31e84b0 rank = 1 offset = 0 sizes = [2] strides = [1] data =
^
<stdin>:2:8: note: possible intended match here
[53934704, 0]
^
Input file: <stdin>
Check file: $HOME/opt/llvm-project/mlir/test/Integration/GPU/CUDA/all-reduce-and.mlir
-dump-input=help explains the following input dump.
Input was:
<<<<<<
1: Unranked Memref base@ = 0x31e84b0 rank = 1 offset = 0 sizes = [2] strides = [1] data =
check:64'0 X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
2: [53934704, 0]
check:64'0 ~~~~~~~~~~~~~~
check:64'1 ? possible intended match
>>>>>>
--
I tested with the all-reduce-and.mlir
code and it works if I comment out this line:
%reduced = "gpu.all_reduce"(%val) ({}) { op = "and" } : (i32) -> (i32)
For some reason, gpu.all_reduce
can’t access that part of the memory where the matrix is stored.
How do I fix this?
OS:
Linux 5.4.0-1030-gcp #32-Ubuntu SMP 2020 x86_64 GNU/Linux
Compiler:
Ubuntu clang version 11.0.0-2~ubuntu20.04.1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
GPU:
Thu Jul 15 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:04.0 Off | 0 |
| N/A 47C P8 31W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 On | 00000000:00:05.0 Off | 0 |
| N/A 72C P8 34W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Thanks in advance.