For the AMDGPU architecture, during RA, we prefer to have a cost associated with the registers (CostPerUse) based on a target entity (for instance, the Calling Convention of the current MachineFunction).
Presently CostPerUse is a one-time static value (either zero or a positive value) generated through table-gen.
The current implementation doesn’t allow us to control the reg-cost on the fly.
The AMDGPU ABI has recently been revised by introducing more caller-saved VGPRs (the exact details are explained towards the end of this e-mail), and found that having a dynamic register cost is important to achieve an optical allocation.
Precisely, it is important to limit the number of VGPRs allocated for a kernel/device-function to a smallest value since it will have a direct impact on the occupancy. The occupancy means the number of wavefronts that can be launched at runtime for a kernel program.
Some initial thoughts on how to fix it:
[AMD Official Use Only - Internal Distribution Only]
Please ignore the header “AMD Official Use Only”. I forgot to remove it while posting the email to llvm-dev.
I dont know the history behind CostPerUse word so I may be missing the background associated with it. It seems that it’s misnomer for what it is intended. At first sight, the word indicates that the cost is a function of uses of the register - more the uses more the cost. How do we want to define the value of CostPerUse. Should it be a function of uses? or just the target?
I think extending getCostPerUse as a virtual function and with MachineFunction parameter could be the first step to set up the cost. With MachineFunction, it should able to get calling convention and subtarget information.
Madhur Amilkanthwar via llvm-dev <firstname.lastname@example.org> 於 2020年5月30日 週六 下午8:53寫道：
Yes, making getCostPerUse to a virtual interface will give more handle to the targets to decide how it should be done.