LLVM/CUDA generate LLVM IR

So for a c program we do:

clang -O3 -emit-llvm hello.c -c -o hello.bc

But how to generate an LLVM IR when working with CUDA.

for normal compilation:
clang++ axpy.cu -o axpy --cuda-gpu-arch= -L/ -lcudart_static -ldl -lrt -pthread

I tried adding -S -emit-llvm and changed the output file name, but I keep getting following error:

clang++: /stor/gakadam/llvm_projects/llvm/tools/clang/lib/Driver/Driver.cpp:1618: virtual {anonymous}::OffloadingActionBuilder::DeviceActionBuilder::ActionBuilderReturnCode {anonymous}::OffloadingActionBuilder::CudaActionBuilder::getDeviceDepences(clang::driver::OffloadAction::DeviceDependences&, clang::driver::phases::ID, clang::driver::phases::ID, {anonymous}::OffloadingActionBuilder::DeviceActionBuilder::PhasesTy&): Assertion `CurPhase < phases::Backend && "Generating single CUDA " "instructions should only occur " “before the backend phase!”’ failed.

I tried several combinations but no avail!

Any suggestions?

Thank you.

Sincerely,
Guru

Moving to cfe-dev

+Art and Justin

If you add -### to your original command, you'll see that for CUDA
compilations, we invoke clang -cc1 twice: Once for the host, and once
for the device. We can't emit llvm or asm for both host and device at
once, so you need to tell clang which one you want.

The flag to do this is --cuda-device-only (or --cuda-host-only).

Alternatively, you could compile with -save-temps to get everything.

Feel free to send me a patch adding this information to
http://llvm.org/docs/CompileCudaWithLLVM.html so that we can help
others avoid this hiccup. The document lives in
llvm/docs/CompileCudaWithLLVM.rst.

I tried adding -S -emit-llvm and changed the output file name, but I keep getting following error:

That is a bug -- we should give you a meaningful error. It looks like
this bug was probably introduced by the generic offloading driver
changes.

I am having difficulty reproducing the assertion failure, however.
Can you please provide a concrete steps to reproduce?

Regards,
-Justin

Hi,

Thank you Justin for your prompt reply. I was able to generate an LLVM IR.

For the error reproduction purposes, I have listed below all the commands which worked and which did not work.

Works (I have not yet checked if files generated by all of them are same or not):

clang++ -O3 -emit-llvm -c axpy.cu -o axpy.bc --cuda-gpu-arch=sm_35 --cuda-path=/usr/local/cuda/ --cuda-device-only

clang++ -O3 -emit-llvm -c axpy.cu -o axpy.bc --cuda-device-only

Does not work:

clang++ -O3 -emit-llvm -c axpy.cu --cuda-gpu-arch=sm_35 -o axpy.bc

I think –cuda-gpu-arch=sm_35 and –cuda-path=/usr/local/cuda/ should be included, as the resulting code might be optimized for that architecture. I might be wrong though.

Thank you again.

-Guru

Thank you very much for the testcases -- I'll look into fixing the
assertion failure.

I think --cuda-gpu-arch=sm_35 and --cuda-path=/usr/local/cuda/ should be included, as the resulting code might be optimized for that architecture.

You want --cuda-gpu-arch=sm_35, otherwise we'll default to sm_20.
Which doesn't make a huge difference beyond affecting which intrinsics
are available to you, but still. You also want to pass sm_35 because
that will affect how we invoke ptxas -- passing sm_35 will cause us to
use ptxas to generate GPU code specifically for sm_35. If you don't
pass this but then run on an sm_35 GPU, the GPU driver will have to
generate code at runtime, and this can be very slow.

--cuda-path is optional, only required if clang can't find the CUDA
installation, or if you want to specify a different one than what it
finds by default. You can see which one it finds by invoking clang
-v.

To close the loop, I found the change that introduced this crash and
pinged the author of the change. Hopefully we can get this fixed
soon.

https://reviews.llvm.org/D18172#580276

This should be fixed in r285263.

Thanks!
Samuel