Hello,
I’m investigating a performance issue in an AMDGPU kernel which is caused by a lot of values being alive at the same time, causing extremely high register pressure (hundreds of spills of both SGPRs and VGPRs).
I’ve been looking at this for months on-and-off and I can’t seem to find a breakthrough. All my attempts net me 2-3% less spills at most.
- I ran a register pressure analysis pass (ad-hoc) after every backend pass to see if the pressure was caused by a backend optimization. It is not - pressure is very high right out of ISel and stays high throughout the pass pipeline.
- Already checked for missing DAG combines as well.
- Checked many common CL options in the AMDGPU backend and IR optimizations, such as loop unrolling thresholds, and got no big impact (nothing more than a couple of %)
- I spent a lot of time looking through debug logs of many common passes/ISel for clues but couldn’t find any.
My current theory is that there’s some IR optimizations (alone or in combination) that unfortunately cause register pressure to rise dramatically. For instance, there’s many values defined in the entry block that are reused throughout the function (some loads of <64 bit values have 100+ users).
I was wondering if anyone did some digging on this kind of issues in the past ? Any help would be greatly appreciated. I’m looking for anything really:
- Ideas of passes to look at/disable/tweak
- Theoretical optimizations that could be performed but aren’t implemented yet
Here’s some information on the kernel that may help:
- Kernel has about 20 000 lines of IR and 500+ basic blocks.
- There’s many layers of inlining, I think up to 10.
- Extensive use of loop unrolling (critical for performance)
- There’s a lot of reads from a big array constant global variable. A lot of things are loaded from there, from simple i32/i64s to pointers that are dereferenced later.
- A lot of values defined in the entry blocks are re-used throughout the function. For instance, some small loads (i32/i64) have 100+ users in many basic blocks.
Thanks,
Pierre