How to compile .ll file containing both gpu binary and x86 intrinsic into object file

BHbean · April 24, 2024, 2:53am

Hello everyone! I had been working on MLIR gpu dialect for a while, and I recently came across a problem.

I built a customized my-opt and a customized my-translate based on MLIR. I defined a function named my_func() inside a .mlir file, and optimized it using customized passes and MLIR passes to get a final .ll file:

$ my-opt my_func.mlir -some-passes | \
  my-translate -my-mlir-to-llvmir -o out.ll

I used the gpu related passes during the lowering progress, thus the gpu binary is embedded into the out.ll file. Meanwhile, as I wrapped x86 SHA-NI intrinsics into customized dialect and translate them into x86 specific instructions, I also got IRs like %130 = call <4 x i32> @llvm.x86.sha1rnds4(<4 x i32> %120, <4 x i32> %128, i8 0) in the out.ll file.

I want to compile out.ll into out.o object file in order to be linked with existing cpp files. However, when I tried the following command, it prompts a backend error:

$ clang-17 -cc1 -triple nvptx64-nvidia-cuda -aux-triple x86_64-unknown-linux-gnu -S \
	       -target-cpu nvptx64-nvidia-cuda -aux-target-cpu x86_64-unknown-linux-gnu -aux-target-feature +sha -o out.s out.ll

SplitVectorResult #0: t44: v4i32 = llvm.x86.sha1msg1 TargetConstant:i64<11547>, t11, t39

fatal error: error in backend: Do not know how to split the result of this operator!

It seems that the nvgpu backend does not recognize the llvm.x86.sha1msg1 IR, but the x86 intrinsics are only invoked on the host instead of inside the gpu kernel. Therefore, I sincerely ask for help about compiling llvmir containing both gpu binary and x86 specific instructions into object file. (I am using LLVM 17.0.6) Thanks for your help in advance!

NOTE: I have tried compiling cuda file main.cu contianing both x86 SHA-NI intrinsics and a simple gpu kernel, which can success using the following commands:

# using clang
$ clang++ --cuda-gpu-arch=sm 75 -msha -L/usr/local/cuda/lib64 -lcudart main.cu
# using nvcc
$ nvcc -arch=sm 75 -Xcompiler "-Wall -msha" main.cu

BHbean · April 26, 2024, 3:08am

Solved by using the following command:

$ clang++ out.ll -msha -L/usr/local/cuda/lib64 -lcudart -c -o out.o

Topic		Replies	Views
Error at lower gpu dialect to llvmir MLIR	6	469	March 26, 2022
Correct MLIR pass to compile `test/Integration/GPU/CUDA/async.mlir` MLIR	3	591	December 20, 2023
How to generate nvidia cuda bin (cubin) from MLIR? Beginners mlir	12	560	December 5, 2023
MLIR omp.target for gpu offloading MLIR	23	662	November 8, 2023
MLIR gpu-module-to-binary using the CAPI MLIR gpu	2	331	October 22, 2023

How to compile .ll file containing both gpu binary and x86 intrinsic into object file

Related Topics