How to compile .ll file containing both gpu binary and x86 intrinsic into object file

Hello everyone! I had been working on MLIR gpu dialect for a while, and I recently came across a problem.

I built a customized my-opt and a customized my-translate based on MLIR. I defined a function named my_func() inside a .mlir file, and optimized it using customized passes and MLIR passes to get a final .ll file:

$ my-opt my_func.mlir -some-passes | \
  my-translate -my-mlir-to-llvmir -o out.ll

I used the gpu related passes during the lowering progress, thus the gpu binary is embedded into the out.ll file. Meanwhile, as I wrapped x86 SHA-NI intrinsics into customized dialect and translate them into x86 specific instructions, I also got IRs like %130 = call <4 x i32> @llvm.x86.sha1rnds4(<4 x i32> %120, <4 x i32> %128, i8 0) in the out.ll file.

I want to compile out.ll into out.o object file in order to be linked with existing cpp files. However, when I tried the following command, it prompts a backend error:

$ clang-17 -cc1 -triple nvptx64-nvidia-cuda -aux-triple x86_64-unknown-linux-gnu -S \
	       -target-cpu nvptx64-nvidia-cuda -aux-target-cpu x86_64-unknown-linux-gnu -aux-target-feature +sha -o out.s out.ll
SplitVectorResult #0: t44: v4i32 = llvm.x86.sha1msg1 TargetConstant:i64<11547>, t11, t39

fatal error: error in backend: Do not know how to split the result of this operator!

It seems that the nvgpu backend does not recognize the llvm.x86.sha1msg1 IR, but the x86 intrinsics are only invoked on the host instead of inside the gpu kernel. Therefore, I sincerely ask for help about compiling llvmir containing both gpu binary and x86 specific instructions into object file. (I am using LLVM 17.0.6) Thanks for your help in advance!

NOTE: I have tried compiling cuda file main.cu contianing both x86 SHA-NI intrinsics and a simple gpu kernel, which can success using the following commands:

# using clang
$ clang++ --cuda-gpu-arch=sm 75 -msha -L/usr/local/cuda/lib64 -lcudart main.cu
# using nvcc
$ nvcc -arch=sm 75 -Xcompiler "-Wall -msha" main.cu

Solved by using the following command:

$ clang++ out.ll -msha -L/usr/local/cuda/lib64 -lcudart -c -o out.o