Performance killer: more registers are used when multiple target regions are compiled together

Hi all,

I just found when there are multiple offload regions, all the finally assembled kernels use equal amount registers corresponding to the kernel that uses the most registers. This causes all my kernels spilling registers and thus kills performance. This is surprising and I didn’t see this behavior with IBM XL compiler.

The reproducer is provided at
I also noticed the same issue with AOMP.
So I’m wondering where could potentially be buggy in the compiling/linking flow.
Any thoughts?


Thank you! Could you elaborate a bit more? How common are the common functions? Per cpp? or whole application?


Hey Ye,

will be solved (for most common cases) with .

~ Johannes

Cooooooool. Confirmed on my side that register count no longer gets inflated. Ye