[RFC] OpenMP offload support for static libraries

Problem overview
OpenMP offload functionality is currently not supported in static libraries. Because of that an attempt to use offloading in static libraries ends up with a fallback execution of target regions on the host. This limitation clearly has significant impact on OpenMP offload usability.
An output object file that is created by the compiler for offload compilation is a fat object. Such object files besides the code for the host architecture also contains code for the offloading targets which is stored as data in ELF sections with predefined names. Thus, a static library that is created from object files produced by offload compilation would be an archive of fat objects.
Clang driver currently never passes fat objects directly to any toolchain. Instead it performs an unbundling operation for each fat object which extract host and device parts from the object. These parts are then independently processed by the corresponding target toolchains. However, current implementation does not assume that static archives may also be composed from fat objects. No unbundling is done for static archives (they are passed to linker as is) and thus device parts of objects from such archives get ignored.
Suggested solution
It seems feasible to resolve this problem by changing the offload link process - adding an extra step to the link flow which will do a partial linking (ld -r) of fat objects and static libraries as shown on this diagram (I have also attached a .pdf file which illustrates the suggested change)
[Fat objects] \ / [Target1 link] \
               [Partial linking] - [Unbundling] - [TargetN link] - [Host link]
[Static libs] / \--- Host part --/
Linker will pull in all necessary dependencies from static libraries while performing partial linking, so the result of partial linking would be a fat object with concatenated device parts from input fat objects and required dependencies from static libraries. These concatenated device objects will be stored in the corresponding ELF sections of the partially linked object.
Unbundling operation on the partially linked object will create one or more device objects for each offloading target, and these objects will be linked by corresponding target toolchains the same way as it is done now. Offload bundler tool would require enhancements to support unbundling of multiple concatenated device objects for each offloading target.
Host link action can be changed to use host part of the partially linked object while linking the final image.
Do you see any potential problems in the proposed change?

offload from static libs.pdf (509 KB)