Hi all,

I have made an attempt of implementing an optimal variant of the register bank selector (regbankselect). The code is available for review at https://reviews.llvm.org/D90304, and I would greatly appreciate if anyone interested can provide their comments. I have run a few tests the regbankselect-*.mir testcases for AAarch64 and it seems to work, but more tests are surely needed to increase confidence in the implementation. I also tried using AMDGPU, but that backend does not provide the full list of InstructionMappings for a given MachineInstr, which is needed in order to compute the optimal selection of register banks.

Cheers,

Gabriel Hjort Åkerlund

Hi Gabriel,

thank you so much for doing this! I’ll try out the patch in our downstream implementation right away. Do you know how big of an impact this has on compile time compared to fast and greedy?

Cheers,

Dominik

Hi Dominik,

(Sorry for spamming; the first reply only went to you and not the list.)

Cool that you want to try it out! There will for sure be some bugs in it, so please let me know if/when you find one and I’ll fix it. And if you could make a testcase out of it, that would be superb (as there’s currently a complete lack of tests).

Although I haven’t measured it, I expect the compilation time to take about 3x more compared to greedy as it makes three passes over the instructions. However, the most amount of work is done in the first pass, which is comparable to the pass made in greedy. So hopefully it’s less than 3x, but this should really be measured over a set of functions to get an accurate figure. Also, there will most likely be improvements to be had to decrease compilation time.

Cheers,

Gabriel

Hi Gabriel,

it's cool that you try to do this, but I think you need some more

proof before calling this optimal

Put briefly, dynamic programming approaches for this kind of

mapping/matching can be optimal if the underlying dependency structure

is a tree. If it's a more general DAG, optimality goes out of the

window. In your dynamic programming approach, my understanding is that

you remember the cost of every possible "realization"/"mapping" of

every node individually. So, you end up computing N*M pieces of

information, where N is the number of nodes and M is the number of

options to choose from at each node.

Unfortunately, the final selection of choices only works in a tree,

because you basically have to choose each node's

"realization"/"mapping" based on a single successor.

As soon as you're in a general DAG, where a node has to satisfy

*multiple* successors, that no longer works. You could theoretically

extend the dynamic programming approach, but only by computing

information about the cost of *correlated* choices for multiple nodes

simultaneously. But then, your performance can go out of the window

because you need to compute something like N*M^K pieces of

information, where K is a bound on the number of nodes whose choices

you need to consider simultaneously. K cannot be bounded in general

(other than by some trivial function of the number of nodes in the

graph), which means you end up with an exponential worst case if you

want to solve this optimally. I would actually expect optimal register

bank selection on a general DAG to be NP-complete, but I haven't

thought about it too deeply.

Cheers,

Nicolai

Hi Nicolai,

Thanks for your response!

Fair enough, I will rename it to "global" as that is a property that we can all agree on. And if it happens to compute the optimal selection, then that's just a bonus. =)

Cheers,

Gabriel