I'm working on support for the latest generation of AMD GPUs (Southern
Islands) in the R600 backend, and I need some advice on how to handle
interactions between two different ALUs.
The processors on Southern Islands GPUs are grouped into compute units,
which contain 1 Scalar ALU (sALU) and 64 Vector ALUs (vALU). The sALU
is mainly responsible for flow control (implemented using predicates) and
loading data from read-only memory. The vALU does most of the data
processing and has a much larger instruction set than the sALU.
Even though a compute unit has 1 sALU and 64 vALUs, from the programmers
perspective, there is just one sALU and one vALU. Programs written for
Southern Islands intermix sALU and vALU instructions and all
instructions are executed in order no matter what ALU the are executed
on, so there is no synchronization needed between the ALUs.
Each ALU has its own register file: SGPRs for sALU, and VGPRs for vALU.
The vALU can read from VGPRs and also SGPRs, but the sALU can only read
from SGPRs. This restriction on the sALU seems to be causing the
instruction selector to generate some illegal copies, which is the main
problem I'm trying to solve. For example:
NODE0 = ISD::ADD SGPR0, VGPR0
can be selected to:
SGPR2 = COPY VGPR0
SGPR1 = S_ADD SGPR0, SGPR2
This leaves us with a copy from a VGPR to an SGPR, which is illegal.
Any suggestions on how to solve this problem or how best to model these