Code share CUDA + libomptarget

CUDA, OMP4 and anyone else "offloading" will have similar goals at
some point/layer.. (This is obvious and accurate statement, right?)

To reiterate some previous comments -
A "good" (my opinion) offloading library should probably have 3 layers
of internal API separation

Top - programming model specific (CUDA, OMP. etc)
Middle - Some common denominator which many things end up equating to
Bottom - Target specific