Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection

Hi All,

I’m writing to introduce and attract attention to the proposed change that was published for the review several month ago.

I also tried to contact code owners explicitly but had no reply yet.

Brief story:

In SIMT architectures VGPRs are high-demand resource. Same time significant part of the computations operate on naturally scalar data.
That computations can be performed by the SALU and save a lot of VGPRs. This is intended to increase occupancy.
Also, splitting the data flow to scalar and vector parts provide more flexibility to the instruction scheduler that can increase HW utilization.

On GPU targets we say that instruction is vector if it operates on VGPR operands each lane of which contains different values.
We say the instruction is scalar if it operates on SGPR that is shared among the all threads in the warp.

Divergence Analysis was introduced by F. Pereira & Co in 2013 and now is a part of LLVM core analysis stuff.
Unfortunately it’s results are mostly useless because there is no way to inform instruction selection DAG about the divergence property of the concrete instruction.
Literally, IR operation that has not divergent operands produces uniform result and should be selected to scalar instruction.

We used to pass divergence data for memory access instructions through metadata just because MemSDNode has memory operand that refer the IR.
This approach is restricted to memory accesses only. That’s why we’d need another pass working on the machine code that propagates divergence property
from the value load to computations and finally to the result store. Except the fact that we’d need one more pass,
this pass would repeat on the machine instructions same algorithm that was already done by the divergence analysis over IR.

Since SDNode flags field was recently enhanced to 16 bits and there are 5 bits unoccupied yet we have a chance to use them for passing divergence data to instruction selection.

This change introduce possible approach to the implementation of such enhancement.
It passes DA data for load instructions only. If accepted we’ll go ahead and add same code to handle other instructions as well.

I’d appreciate any advises and/or opinions.

Thanks in advance.