Thanks for suggestions Artem.
Can you clarify on “You may be able to shuffle things around enough to avoid the issue for the time being, but it will not change the fact that the executable is too large and the overflow will come back sooner or later, as binaries tend to grow over time.”
As far as I can tell there is only one relocation from .text in to a cuda code.
This is an implementation detail. There’s no guarantee that it will be the case for everyone. It’s just data. Nothing stops me from writing the code to access some GPU binaries directly. I believe some of the CUDA libraries do so.
Some kind of cuda runtime. If the rest of the code is just GPU code, and .nv_fatbin is after .bss then seems like .nv_fatbin can continue to grow.
It just happens to end up there as yet another data section. It is not expected to grow at runtime. AFAICT the GPU binaries are placed in a special section so various CUDA tools can find them. E.g. cuobjdump. Renaming the section or moving it around will not affect the functionality of the application itself.
I guess .text and other sections can grow and eventually yes we will hit relocation overflows because of that.
It all will depend on the specifics of what gets linked into your executable.
In general, if the sum total of your coda and data is more than 2GB, the possibility for the overflow is there. For an executable that large, it’s impractical-to-impossible to guarantee that code X that accesses data Y are close enough. You can often do it in a specific case, but not in general. You do not control what ends up in .nv_fatbin and you do not have control over who/where/how accesses it. Moving fatbin to the top would move it out of the way of relocs between .text and regular data, but you’re still open to overflows between the end of .text and .nv_fatbinary, if they are large enough.
In other words, relocating GPU blobs to the top may provide a benefit, but it’s not a complete solution. We’ve tried that already internally. One example – it is not sufficient to avoid overflows in a tensorflow application with all needed CUDA libraries statically linked in.
BTW, I did attempt to move nv_fatbin upwards before. We’ve concluded that it wasn’t worth it then: https://reviews.llvm.org/D47396