Using clang-offload-bundler to bundle application/executable

Hello,

I'm trying to use the clang-offload-bundler to merge the results of the
compilation for device and host for an application using OpenMP target
pragmas.

My goal is to run a custom pass on the device code and then bundle an
application from the modified device code and the host code. For testing
purposes, I use a very simple vector addition with target offloading
pragmas and the "offloading" to x86_64 (LLVM/Clang release 9.0).

I use clang to generate LLVM IR from the application code. Then I use
the clang-offload-bundler to unbundle the LLVM IR for device and host. I
run opt and llc separately on the LLVM IR for host and device.

To this end, on my application code (code.c), I run the following steps:

clang -fopenmp -fopenmp-targets=x86_64-unknown-linux-gnu -S -emit-llvm
-O3 -o bundled.ll code.c
clang-offload-bundler --inputs=bundled.ll
--outputs=host.in.ll,device.in.ll
--targets=host-x86_64-unknown-linux-gnu,openmp-x86_64-unknown-linux-gnu
--type=ll --unbundle
opt -o host.opt.ll -O3 host.in.ll
llc -o host.o -filetype=obj host.opt.ll
opt -o device.opt.ll -O3 device.in.ll (custom pass should be added
here later on)
llc -o device.o -filetype=obj device.opt.ll

Afterwards, I tried to bundle the object files for host & device:

clang-offload-bundler --inputs=host.o,device.o --outputs=app.o
--targets=host-x86_64-unknown-linux-gnu,openmp-x86_64-unknown-linux-gnu
--type=o

This command completes without any warning or error.

I tried to invoke gcc on the output to generate the executable from the
object file (app.o):

gcc app.o -lomp -lomptarget

However, I get the following error:

/usr/bin/ld:
app.o:(.rodata..omp_offloading.device_images[.omp_offloading.descriptor_reg.x86_64-unknown-linux-gnu]+0x10):
undefined reference to `.omp_offloading.entries_begin'
/usr/bin/ld:
app.o:(.rodata..omp_offloading.device_images[.omp_offloading.descriptor_reg.x86_64-unknown-linux-gnu]+0x18):
undefined reference to `.omp_offloading.entries_end'
/usr/bin/ld:
app.o:(.rodata..omp_offloading.descriptor[.omp_offloading.descriptor_reg.x86_64-unknown-linux-gnu]+0x10):
undefined reference to `.omp_offloading.entries_begin'
/usr/bin/ld:
app.o:(.rodata..omp_offloading.descriptor[.omp_offloading.descriptor_reg.x86_64-unknown-linux-gnu]+0x18):
undefined reference to `.omp_offloading.entries_end'

Do I need to run any additional steps before going through gcc or pass
additional flags? Is there another way of accomplishing the desired
behavior?

Thanks a lot in advance,

Best regards

Lukas

Hi Lukas,

I did not really follow the recent changes so I cannot really help.

@Sergey @Alexey @Jon @Hal, could any of you comment on this use case/problem please?

Thanks,
  Johannes

Hi Lukas,

Hello,

I'm trying to use the clang-offload-bundler to merge the results of the
compilation for device and host for an application using OpenMP target
pragmas.

My goal is to run a custom pass on the device code and then bundle an
application from the modified device code and the host code. For testing
purposes, I use a very simple vector addition with target offloading
pragmas and the "offloading" to x86_64 (LLVM/Clang release 9.0).

I use clang to generate LLVM IR from the application code. Then I use
the clang-offload-bundler to unbundle the LLVM IR for device and host. I
run opt and llc separately on the LLVM IR for host and device.

To this end, on my application code (code.c), I run the following steps:

clang -fopenmp -fopenmp-targets=x86_64-unknown-linux-gnu -S -emit-llvm
-O3 -o bundled.ll code.c
clang-offload-bundler --inputs=bundled.ll
--outputs=host.in.ll,device.in.ll
--targets=host-x86_64-unknown-linux-gnu,openmp-x86_64-unknown-linux-gnu
--type=ll --unbundle
opt -o host.opt.ll -O3 host.in.ll
llc -o host.o -filetype=obj host.opt.ll
opt -o device.opt.ll -O3 device.in.ll (custom pass should be added
here later on)
llc -o device.o -filetype=obj device.opt.ll

Afterwards, I tried to bundle the object files for host & device:

clang-offload-bundler --inputs=host.o,device.o --outputs=app.o
--targets=host-x86_64-unknown-linux-gnu,openmp-x86_64-unknown-linux-gnu
--type=o
This command completes without any warning or error. 

I tried to invoke gcc on the output to generate the executable from the
object file (app.o):

gcc app.o -lomp -lomptarget
However, I get the following error:

/usr/bin/ld:
app.o:(.rodata..omp_offloading.device_images[.omp_offloading.descriptor_reg.x86_64-unknown-linux-gnu]+0x10):
undefined reference to `.omp_offloading.entries_begin'
/usr/bin/ld:
app.o:(.rodata..omp_offloading.device_images[.omp_offloading.descriptor_reg.x86_64-unknown-linux-gnu]+0x18):
undefined reference to `.omp_offloading.entries_end'
/usr/bin/ld:
app.o:(.rodata..omp_offloading.descriptor[.omp_offloading.descriptor_reg.x86_64-unknown-linux-gnu]+0x10):
undefined reference to `.omp_offloading.entries_begin'
/usr/bin/ld:
app.o:(.rodata..omp_offloading.descriptor[.omp_offloading.descriptor_reg.x86_64-unknown-linux-gnu]+0x18):
undefined reference to `.omp_offloading.entries_end'
Do I need to run any additional steps before going through gcc or pass
additional flags? Is there another way of accomplishing the desired
behavior?

I think you should use clang with the full -fopenmp -fopenmp-targets=x86_64-unknown-linux-gnu to build the final executable, ie
$ clang -fopenmp -fopenmp-targets=x86_64-unknown-linux-gnu app.o
If I remember correctly, the above symbols are provided by a linker script in Clang 9.0, which gcc has no clue about.

Please note that there have been recent changes in that area and trunk now relies on a new tool clang-offload-wrapper AFAICS (see https://github.com/llvm/llvm-project/commit/5836c356fa6e17d0e10a2f9e0e111b7236dc15fb).
As before this should be called transparently by clang, as long as it’s correctly invoked with a consistent value for -fopenmp-targets.

I haven’t tested above command, but I hope it works. Please let me know if it doesn’t and I’ll take a closer look!
Jonas

Hi Jonas,

thanks for your help!

I tried the suggested command, unfortunately it still fails:

clang -fopenmp -fopenmp-targets=x86_64-unknown-linux-gnu app.o

yields:

/usr/bin/ld: /tmp/app-c48732.o: relocation R_X86_64_32 against `.data’ can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: nonrepresentable section on output
/usr/bin/ld: cannot find /tmp/app-23d932.out inside /
/usr/bin/ld: cannot find /tmp/app-23d932.out inside /
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

I also tried to generate and bundle assembly (.asm) and run that through clang (with openmp-flags as above), but that gave the same error.

Lukas

Hi Jonas,

thanks for your help!

I tried the suggested command, unfortunately it still fails:

clang -fopenmp -fopenmp-targets=x86_64-unknown-linux-gnu app.o

yields:

/usr/bin/ld: /tmp/app-c48732.o: relocation R_X86_64_32 against `.data’ can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: nonrepresentable section on output
/usr/bin/ld: cannot find /tmp/app-23d932.out inside /
/usr/bin/ld: cannot find /tmp/app-23d932.out inside /
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

I also tried to generate and bundle assembly (.asm) and run that through clang (with openmp-flags as above), but that gave the same error.

Hmm, the target image is built as a shared library. To make this work, Clang adds -shared -fPIC in Generic_GCC::TranslateArgs.
You’re replacing this step with a manual invocation of `opt’, so we need to pass some extra flags. I’m currently looking into what’s needed, opt --relocation-model=pic might be a good start but I’m not sure that’s all.

Jonas

Hi Jonas,

thanks for the hint, with “–relocation-model=pic” added to opt and llc, I can run the bundled object file through clang without any errors.

However, the linking in clang does not seem to work right, at runtime the libomptarget-plugin (rtl.cpp) throws an error because it is unable to handle the ELF:

Libomptarget → Loading RTLs…
Libomptarget → Loading library ‘libomptarget.rtl.ppc64.so’…
Libomptarget → Unable to load library ‘libomptarget.rtl.ppc64.so’: libomptarget.rtl.ppc64.so: cannot open shared object file: No such file or directory!
Libomptarget → Loading library ‘libomptarget.rtl.x86_64.so’…
Libomptarget → Successfully loaded library ‘libomptarget.rtl.x86_64.so’!
Libomptarget → Registering RTL libomptarget.rtl.x86_64.so supporting 4 devices!
Libomptarget → Loading library ‘libomptarget.rtl.cuda.so’…
Libomptarget → Unable to load library ‘libomptarget.rtl.cuda.so’: libomptarget.rtl.cuda.so: cannot open shared object file: No such file or directory!
Libomptarget → Loading library ‘libomptarget.rtl.aarch64.so’…
Libomptarget → Unable to load library ‘libomptarget.rtl.aarch64.so’: libomptarget.rtl.aarch64.so: cannot open shared object file: No such file or directory!
Libomptarget → RTLs loaded!
Target x86_64 RTL → Unable to get ELF handle: invalid operand!
Libomptarget → Image 0x0000000000000000 is NOT compatible with RTL libomptarget.rtl.x86_64.so!
Libomptarget → No RTL found for image 0x0000000000000000!
Libomptarget → Done registering entries!
Libomptarget → Call to omp_get_num_devices returning 0
Libomptarget → Default TARGET OFFLOAD policy is now disabled (no devices were found)

Do I need to specify additional flags to clang?

Thanks,

Lukas

So it seems to work for me:

$ LIBOMPTARGET_DEBUG=1 LD_LIBRARY_PATH=libomptarget/libomptarget-debug/:$LD_LIBRARY_PATH ./a.out
Libomptarget --> Loading RTLs...
[...]
Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.x86_64.so'!
Libomptarget --> Registering RTL libomptarget.rtl.x86_64.so supporting 4 devices!
[...]
Libomptarget --> RTLs loaded!
Libomptarget --> Image 0x0000000000602050 is compatible with RTL libomptarget.rtl.x86_64.so!
Libomptarget --> RTL 0x000000000061b7c0 has index 0!
Libomptarget --> Registering image 0x0000000000602050 with RTL libomptarget.rtl.x86_64.so!
Libomptarget --> Done registering entries!
Libomptarget --> Call to omp_get_num_devices returning 4
Libomptarget --> Default TARGET OFFLOAD policy is now mandatory (devices were found)
Libomptarget --> Entering target region with entry point 0x00000000004008c0 and device Id -1
Libomptarget --> Checking whether device 0 is ready.
Libomptarget --> Is the device 0 (local ID 0) initialized? 0
Libomptarget --> Device 0 is ready to use.
Target x86_64 RTL --> Dev 0: load binary from 0x0000000000602050 image
Target x86_64 RTL --> Expecting to have 1 entries defined.
Target x86_64 RTL --> Offset of entries section is (0x0000000000201020).
Target x86_64 RTL --> Pointer to first entry to be loaded is (0x00002ad4f4632020).
Target x86_64 RTL --> Entries table range is (0x00002ad4f4632020)->(0x00002ad4f4632040)
Libomptarget --> Launching target execution __omp_offloading_48_54503d96_main_l8 with pointer 0x00002ad4f44316b0 (index=0).
Target x86_64 RTL --> Running entry point at 0x00002ad4f44316b0...
omp_is_initial_device: 0
Libomptarget --> Unloading target library!
Libomptarget --> Image 0x0000000000602050 is compatible with RTL 0x000000000061b7c0!
Libomptarget --> Unregistered image 0x0000000000602050 from RTL 0x000000000061b7c0!
Libomptarget --> Done unregistering images!
Libomptarget --> Removing translation table for descriptor 0x00000000006040e0
Libomptarget --> Done unregistering library!

I mostly used the steps from your initial post, but passed --relocation-model=pic for the device code (both opt and llc) and used clang -fopenmp -fopenmp-targets=x86_64-unknown-linux-gnu app.o for the final linking.

It’s possible that my code is just too simple (target region with a printf of omp_is_initial_device) to trigger the problem you’re seeing. Could you share your exact code?
For the error message “invalid operand”, I think libomptarget uses libelf incorrectly. Maybe you can try replacing “elf_errmsg(-1)” by “elf_errmsg(elf_errno())”? At least that’s how I think it should be according to the the man page.

Regards,
Jonas

Hi Jonas,

thanks for answer, the command indeed works, I was just missing the vendor-part of the triple “-unknown” in some places.

The clang-offload-bundler and the clang linking step seem to be very sensitive to vendor-part of the target triple.

The following series of commands produces a working executable:

clang -fopenmp -fopenmp-targets=x86_64-unknown-linux-gnu -S -emit-llvm -O3 -o bundled.ll code.c
clang-offload-bundler --inputs=bundled.ll --outputs=host.in.ll,device.in.ll --targets=host-x86_64-unknown-linux-gnu,openmp-x86_64-unknown-linux-gnu --type=ll --unbundle
opt -o host.opt.bc -O3 host.in.ll
llc -o host.o -filetype=obj host.opt.bc
opt -o device.opt.bc -O3 --relocation-model=pic device.in.ll
llc -o device.o -filetype=obj --relocation-model=pic device.opt.bc
clang-offload-bundler --inputs=host.o,device.o --outputs=app.o --targets=host-x86_64-unknown-linux-gnu,openmp-x86_64-unknown-linux-gnu --type=o
clang -fopenmp -fopenmp-targets=x86_64-unknown-linux-gnu -L …/…/llvm-bin/lib app.o

The following series (watch the missing “-unknown”) produces a linker error, probably because the device-part of the app.o bundle is empty:

clang -fopenmp -fopenmp-targets=x86_64-linux-gnu -S -emit-llvm -O3 -o bundled.ll code.c
clang-offload-bundler --inputs=bundled.ll --outputs=host.in.ll,device.in.ll --targets=host-x86_64-unknown-linux-gnu,openmp-x86_64-linux-gnu --type=ll --unbundle
opt -o host.opt.ll -O3 host.in.ll
llc -o host.o -filetype=obj host.opt.ll
opt -o device.opt.ll -O3 --relocation-model=pic device.in.ll
llc -o device.o -filetype=obj --relocation-model=pic device.opt.ll
clang-offload-bundler --inputs=host.o,device.o --outputs=app.o --targets=host-x86_64-unknown-linux-gnu,openmp-x86_64-linux-gnu --type=o
clang -L …/…/llvm-bin/lib -fopenmp -fopenmp-targets=x86_64-linux-gnu -o app.exe app.o

The error is:

/usr/bin/ld: /tmp/app-ba8c99.o: file not recognized: file truncated
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find /tmp/app-d4cba4.out inside /
/usr/bin/ld: cannot find /tmp/app-d4cba4.out inside /
clang-9: error: linker (via gcc) command failed with exit code 1 (use -v to see invocation)
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

I tested two other command sequences (with more “-unknown” in different places), both run without error, but the resulting executable is not working (same error as yesterday in the libomptarget-rtl):

clang -fopenmp -fopenmp-targets=x86_64-linux-gnu -S -emit-llvm -O3 -o bundled.ll code.c
clang-offload-bundler --inputs=bundled.ll --outputs=host.in.ll,device.in.ll --targets=host-x86_64-unknown-linux-gnu,openmp-x86_64-linux-gnu --type=ll --unbundle
opt -o host.opt.ll -O3 host.in.ll
llc -o host.o -filetype=obj host.opt.ll
opt -o device.opt.ll -O3 --relocation-model=pic device.in.ll
llc -o device.o -filetype=obj --relocation-model=pic device.opt.ll
clang-offload-bundler --inputs=host.o,device.o --outputs=app.o --targets=host-x86_64-unknown-linux-gnu,openmp-x86_64-unknown-linux-gnu --type=o
clang -L …/…/llvm-bin/lib -fopenmp -fopenmp-targets=x86_64-unknown-linux-gnu -o app.exe app.o

clang -fopenmp -fopenmp-targets=x86_64-linux-gnu -S -emit-llvm -O3 -o bundled.ll code.c
clang-offload-bundler --inputs=bundled.ll --outputs=host.in.ll,device.in.ll --targets=host-x86_64-unknown-linux-gnu,openmp-x86_64-unknown-linux-gnu --type=ll --unbundle
opt -o host.opt.ll -O3 host.in.ll
llc -o host.o -filetype=obj host.opt.ll
opt -o device.opt.ll -O3 --relocation-model=pic -mtriple=x86_64-unknown-linux-gnu device.in.ll
llc -o device.o -mtriple=x86_64-unknown-linux-gnu -filetype=obj --relocation-model=pic device.opt.ll
clang-offload-bundler --inputs=host.o,device.o --outputs=app.o --targets=host-x86_64-unknown-linux-gnu,openmp-x86_64-unknown-linux-gnu --type=o
clang -L …/…/llvm-bin/lib -fopenmp -fopenmp-targets=x86_64-unknown-linux-gnu -o app.exe app.o

I’m wondering if clang-offload-bundler should be so sensitive to the vendor-string in the target triple. In particular, if you run clang with “-fopenmp-targets=x86_64-linux-gnu”, you still have to run clang-offload-bundler with the “unknown” vendor-string, otherwise the device-part (device.in.ll) will be empty.

Is this the intended behavior?

Thanks,

Lukas

Hi Lukas,

Hi Jonas,

thanks for answer, the command indeed works, I was just missing the vendor-part of the triple “-unknown” in some places.

The clang-offload-bundler and the clang linking step seem to be very sensitive to vendor-part of the target triple.

[…]

I’m wondering if clang-offload-bundler should be so sensitive to the vendor-string in the target triple. In particular, if you run clang with “-fopenmp-targets=x86_64-linux-gnu”, you still have to run clang-offload-bundler with the “unknown” vendor-string, otherwise the device-part (device.in.ll) will be empty.

Is this the intended behavior?

Yes, I think it is somewhat intended: Clang normalizes the target triple. As clang-offload-bundler is usually not directly invoked by the user, it just takes the input triple and uses it as a unique key.
I’m not sure if it would be worth to implement normalization in another tool. Usually, you should be fine if you just copy the commands output by clang -v.

Regards,
Jonas