Global variant of regbankselect

Hi all,

I have made several improvements and bug fixes to the global variant of regbankselect (available on https://reviews.llvm.org/D90304), and is now in a more mature state compared to its first draft. I ran a few tests to compare run times against the greedy version for AArch64's ARM64 target, and on those 9 testcases the global variant increases runtime of regbankselect with 10% (geometric mean). At most the runtime increased by 54%, and at least by 0%. There is also a patch for removing duplicates mappings, which will reduce these numbers once accepted (https://reviews.llvm.org/D92161). The quality of the code generated was equal to that of greedy.

In running these tests, however, I discovered that RegisterBankInfo's getInstrAlternativeMappings() of AArch64 does not compute all possible mappings. Most often an empty array is returned, and instead the target relies on getInstrMapping() to return the most appropriate mapping depending in the current context. But that assumes that register banks have already been selected for all registers used an instruction, which is the case for fast and greedy mode but not for global mode. Hence, for global mode to work it requires the target to give all possible mappnings, even when a register bank has not yet been selected for all input registers. For example, in order to run the tests above I had to modify the AArch64 backend to return possible mappings for G_STORE in order for global regbankselect to make suitable selections.

So in order to proceed we would need to augment the backends to give all alternatives, but I doubt the backend maintainers would be interested of doing this before getting proof that the global variant actually improves code quality. Hence I would like to have a testcase where it is known that the greedy variant does a sub-optimal selection. Does anyone know of such a testcase?

Cheers,
Gabriel Hjort Åkerlund

Hi Gabriel,

Thanks for pushing on that front!

To answer your question (with a few years old context), the reason we never really looked into a global/optimal algorithm is because the current ones are good enough. Back when I was working on AArch64, we didn’t have a test case where such scheme would prove worth it.

Again this may have changed since last time I looked into it. Amara and Jessica (CC’ed) would know for sure.

Cheers,
-Quentin

For AArch64, we’re currently working on catching up to SelectionDAG in terms of runtime performance, and we still have quite a bit to do on that front. Because of that we haven’t spent much time looking at where we see potential for a global regbankselect algorithm.

I think this is overall a nice idea to explore but unfortunately we don’t have any test cases or much bandwidth to spend on this at this time. I can see us perhaps having some time to look at this in a few months.

Thanks,
Amara

Hi Gabriel,

thanks for doing this! I'm excited to see this progressing.

Hi all,

I have made several improvements and bug fixes to the global variant of regbankselect (available on https://reviews.llvm.org/D90304), and is now in a more mature state compared to its first draft. I ran a few tests to compare run times against the greedy version for AArch64's ARM64 target, and on those 9 testcases the global variant increases runtime of regbankselect with 10% (geometric mean). At most the runtime increased by 54%, and at least by 0%. There is also a patch for removing duplicates mappings, which will reduce these numbers once accepted (https://reviews.llvm.org/D92161). The quality of the code generated was equal to that of greedy.

In running these tests, however, I discovered that RegisterBankInfo's getInstrAlternativeMappings() of AArch64 does not compute all possible mappings. Most often an empty array is returned, and instead the target relies on getInstrMapping() to return the most appropriate mapping depending in the current context. But that assumes that register banks have already been selected for all registers used an instruction, which is the case for fast and greedy mode but not for global mode. Hence, for global mode to work it requires the target to give all possible mappnings, even when a register bank has not yet been selected for all input registers. For example, in order to run the tests above I had to modify the AArch64 backend to return possible mappings for G_STORE in order for global regbankselect to make suitable selections.

I'm probably misunderstand something, but what if you have instructions that only have a single valid mapping? Should getInstrAlternativeMappings return that exact mapping instead of an empty array in that case? Hard-coding every single supported GMIR instruction that way sounds rather tedious.

Cheers,

Dominik

I understand that the resources available are limited and that there are other tasks with higher priorities; I very much appreciate that people are taking the time just to answer these emails.

Until people get the time to look at this, what should we do with the code? Currently it is under review (https://reviews.llvm.org/D90304). Should it stay there, or should it be accepted and then refined further as we go? My concern is that it may rot if left unattended for a few months.

Cheers,
Gabriel

Hi Dominik,

Thanks for your encouragement!

Ah, forgive my vagueness. If an instruction only has a single mapping then of course getInstrAlternativeMappings() should return an empty array. The regbankselect merges the mapping returned from getInstrMapping() with getInstrAlternativeMappings() (which in some cases may cause duplicates as getInstrAlternativeMappings() sometimes already contains the mapping returned by getInstrMapping(); hence the need for this patch: https://reviews.llvm.org/D92161). However, the targets I have looked at often only returns a single mapping through getInstrMapping() and then none through getInstrAlternativeMappings(), even though there in fact exist multiple mappings.

Cheers,
Gabriel