Register allocation and optimization of PTX

Hi all,

I work on GPU architecture and have been looking at using the compiler more for my research. There are a few questions that I was hoping to get answered before I start.

Is there a way to do register allocation and optimizations when generating PTX code? Register allocation and some optimizations occur in ptxas (NVIDIA’s optimizing assembler). However, GPGPU-Sim (a contemporary GPU uarch simulator) functionally executes PTX. This is because PTX is documented, and the machine ISA, SASS, is not.

What I would like to do is generate some register allocated and optimized PTX, and compare its correlation to hardware against baseline generated PTX that’s used today.

Thank you in advance for any pointers to resource/help,

–Nick