clang-offload-bundler

Hi all,

I have been playing with the clang driver to see how it performs the compilations for an OpenMP program with target offloading (for NVidia GPU).

I noticed the use of the tool ‘clang-offload-bundler’ which seems to bundle together the object files for the host and for the device.
In particular, if I use -c my foo.o will be a bundle of foo-x86_64.o and foo-cuda.o, and at link time it will unbundle the foo.o to obtain back the two object files.

Now, if I put my foo.o for example in a static library clang does not work because it does not know how to unbundle the object files from the library.

Other compilers, such as IBM XL create a specific section in the ELF to store the device code, in this way we always have one object file and there is no need to bundle/unbundle.
Also if a object file/library was compiled with XL it can not be linked with clang and/or viceversa.

So, am I doing something wrong, or is this the status of the clang driver?
Is the clang-offload-bundler the official choice to manage device code?
If my analysis is correct, what’s the workaround?

Thanks!
Best,
Simone

Hi Simone,

the same answers as always:
1. Offloading to GPUs is not yet working with Clang trunk.
2. Most of this has already been discussed on the mailing list or described in publications. In short, clang-offload-bundler should also use ELF sections for object files. And AFAIK, IBM XL uses the same mechanisms in most cases.

Jonas

Hi Jonas,

The clang-offload-bundler does not look like it uses the ELF sections (both the version of trunk and Ykt), it looks like it just concatenate the object files into one file, but I might be wrong or if not hopefully this will be changed later.
At the moment, even though offloading to GPUs is not working on Clang, the driver seems like is doing something different than any other compiler.
I cced Doru, maybe he can add more about this.

I am just asking because we are going to add GPUs support on Flang and of course we want to keep clang and flang compatible.

Thanks.
Simone

This is implemented so that no changes to the build system are necessary. Have a look at the class ObjectFileHandler (https://github.com/llvm-mirror/clang/blob/f0382ad/tools/clang-offload-bundler/ClangOffloadBundler.cpp#L370).

$ clang -fopenmp -fopenmp-targets=x86_64-unknown-linux-gnu -c target.c
$ objdump -h target.o

target.o: file format elf64-x86-64

Sections:
Idx Name Size VMA LMA File off Algn
   0 .group 00000014 0000000000000000 0000000000000000 00000040 2**2
                   CONTENTS, READONLY, EXCLUDE, GROUP, LINK_ONCE_DISCARD
   1 .text 00000068 0000000000000000 0000000000000000 00000060 2**4
                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
   2 .text.startup 00000080 0000000000000000 0000000000000000 000000d0 2**4
                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
   3 .rodata 00000001 0000000000000000 0000000000000000 00000150 2**0
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   4 .rodata.str1.16 00000024 0000000000000000 0000000000000000 00000160 2**4
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   5 .omp_offloading.entries 00000020 0000000000000000 0000000000000000 00000184 2**0
                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
   6 .rodata..omp_offloading.device_images 00000020 0000000000000000 0000000000000000 000001a8 2**3
                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
   7 .rodata..omp_offloading.descriptor 00000020 0000000000000000 0000000000000000 000001c8 2**3
                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
   8 __CLANG_OFFLOAD_BUNDLE__openmp-x86_64-unknown-linux-gnu 00000588 0000000000000000 0000000000000000 000001f0 2**4
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   9 __CLANG_OFFLOAD_BUNDLE__host-x86_64-unknown-linux-gnu 00000001 0000000000000000 0000000000000000 00000778 2**0
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
  10 .eh_frame 00000098 0000000000000000 0000000000000000 00000780 2**3
                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
  11 .init_array.0 00000008 0000000000000000 0000000000000000 00000818 2**3
                   CONTENTS, ALLOC, LOAD, RELOC, DATA
  12 .comment 00000038 0000000000000000 0000000000000000 00000820 2**0
                   CONTENTS, READONLY
  13 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00000858 2**0
                   CONTENTS, READONLY

I understand that, but if I put the “target.o” for example in a static library libtarget.a, at link time the clang driver does not call “clang-offload-bundler” on the static library and I get undefined references because it can’t find the definitions that are inside the library.
So I just wanted to bring up this problem, because either I am doing something wrong or it’s a limitation of the driver/bundler.

As I said, I wanted to bring this up to make sure we know what to do when we start implementing the offload support on Flang.

Thanks Jonas!
Simone