Guidance on working with the NVIDIA GPU back-end

Hi all,

I’m primarily a hardware person but would like to do some compiler-architecture co-design research. Are there any good references for the NVPTX backend? I’d like to change that backend to have a limited number of physical registers rather than an unlimited number of virtual ones (for more realistic modeling in a uarch simulator).

Being able to do register allocation and other optimizations on the virtual ISA (PTX) would be incredibly useful to the research community.

Thanks in advance,


Unlimited number of registers in PTX ISA means there is no any meaningful register allocation at all. That is, it makes no sense trying to limit something, which does not exist. NVPTX is lowered further by ptxas into physical registers, but it is out of the scope of LLVM.

Kind regards,

  • Dmitry.

пн, 16 дек. 2019 г. в 17:29, CoffeeBeforeArch via llvm-dev <>:

Hi Dmitry,

Thanks for the response. I understand how the flow of PTX->SASS works, but what I’m looking for guidance on is any references/insights on what would be required to register-allocate PTX.

While this seems odd (and is for running on real hardware), simulators like GPGPU-Sim functionally execute PTX because it is fully documented, while the machine ISA, SASS, is not. This means that the code being used to evaluate new architectures by researchers lacks register allocation and even basic optimizations like hoisting loads.

Hopefully, that clears things up. Any thoughts on the best place to look to get started with that?

All the best,


Well, beyond the NVPTX user guide [1], information has to be retrieved directly from the Targets/nvptx code. There you will see the register numbering pretty much follows the SSA numbering. This would have to be coupled with the real register allocation code from any other backend.

Please note SASS also does scheduling, meaning that the register allocation alone might not serve the proper instruction placement for the real hardware.


пн, 16 дек. 2019 г. в 17:58, CoffeeBeforeArch via llvm-dev <>:

That’s what I figured. Hopefully, I can generate some PTX with register allocation and instruction scheduling similar to the SASS. That would really help with the correlation number of the simulator to hardware. Thanks for the help, Dmitry!