[llc] Producing ptx assembly for different target architectures - possible bug?

Hi,

i just found out that i can use llc to also produce ptx assembly for GPUs. I noticed that the produced ptx assembly seems to be targeted at the gpu architecture sm_20 by default.

Is there a way to explicitly demand different or additional target architectures like sm_30 for example?

When i compile a cuda kernel with gpu arch. sm_30 using clang++ the .target directive in the ptx assembly will be set to sm_30. However when i save the bitcode of the same compilation and hand it to llc the .target directive is sm_20. There is an attribute in the bitcode that say "target-cpu"="sm_30". The information that sm_30 is required is still there. I can imagine that llc might not process this information. Could this be a bug?

I am currently using llvm 7.0 and i can provide the bitcode if anyone wants to reproduce the problem.

Best regards
Lorenz

Hi,

i just found out that i can use llc to also produce ptx assembly for
GPUs. I noticed that the produced ptx assembly seems to be targeted at
the gpu architecture sm_20 by default.

This is currently the default CPU type for NVPTX back-end.

Is there a way to explicitly demand different or additional target
architectures like sm_30 for example?

It works the same way as for the other back-ends. You specify the CPU variant with -mcpu=. E.g. for sm_30 you should use -mcpu=sm_30

When i compile a cuda kernel with gpu arch. sm_30 using clang++ the
.target directive in the ptx assembly will be set to sm_30. However when
i save the bitcode of the same compilation and hand it to llc the
.target directive is sm_20. There is an attribute in the bitcode that
say “target-cpu”=“sm_30”. The information that sm_30 is required is
still there.

It’s a function attribute which, generally speaking, can’t be used as the default for the whole module. It also does not do much in NVPTX back-end. Eventually it will be used to enforce that -mcpu=XXX is the same or higher than the all target-cpu attributes in a module. This is one of the areas where NVPTX can’t implement what the attribute was intended to do – target different CPU variants within the same module. It’s doable on x86 where the same ISA can represent instructions for different CPU variants, but can’t be done in PTX which requires everything in the module to be for the same GPU.

I can imagine that llc might not process this information. Could this be a bug?

Not really. It’s more of a feature in clang/llvm which NVPTX back-end can’t implement.

–Artem

Hi,

i just found out that i can use llc to also produce ptx assembly for
GPUs. I noticed that the produced ptx assembly seems to be targeted at
the gpu architecture sm_20 by default.

This is currently the default CPU type for NVPTX back-end.

Is there a way to explicitly demand different or additional target
architectures like sm_30 for example?

It works the same way as for the other back-ends. You specify the CPU variant with -mcpu=. E.g. for sm_30 you should use -mcpu=sm_30

-mcpu=sm_30 was just what i was looking for. Thank you very much!

When i compile a cuda kernel with gpu arch. sm_30 using clang++ the
.target directive in the ptx assembly will be set to sm_30. However when
i save the bitcode of the same compilation and hand it to llc the
.target directive is sm_20. There is an attribute in the bitcode that
say “target-cpu”=“sm_30”. The information that sm_30 is required is
still there.

It’s a function attribute which, generally speaking, can’t be used as the default for the whole module. It also does not do much in NVPTX back-end. Eventually it will be used to enforce that -mcpu=XXX is the same or higher than the all target-cpu attributes in a module. This is one of the areas where NVPTX can’t implement what the attribute was intended to do – target different CPU variants within the same module. It’s doable on x86 where the same ISA can represent instructions for different CPU variants, but can’t be done in PTX which requires everything in the module to be for the same GPU.

Thanks for the info. I understand, when having multiple functions with different target-cpus this can only done in multiple modules. But i think use cases for this are rather rare.

Best regards
Lorenz