: undefined symbol: ompt_start_tool

Message: 1
Date: Mon, 26 Oct 2020 15:18:45 -0500
From: Kelvin Li via Openmp-dev <openmp-dev@lists.llvm.org>
To: openmp-dev@lists.llvm.org
Subject: [Openmp-dev] undefined symbol: ompt_start_tool
Message-ID:
<OFF5259549.0EC65D66-ON8525860D.006EC181-8525860D.006F94A6@notes.na.collabserv.com>

Content-Type: text/plain; charset=“utf-8”

Has anyone encounter the following error? I am wondering if it is
something to do with how I build libomp.so.

$ LD_LIBRARY_PATH=/home/kli/clang-install/lib mpirun -np 1 ./a.out
a.out: symbol lookup error: /home/kli/clang-install/lib/libomp.so:
undefined symbol: ompt_start_tool

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:

Process name: [[14546,1],0]
Exit code: 127

But it works without mpirun.

$ LD_LIBRARY_PATH=/home/kli/clang-install/lib ./a.out
0
1
2
3

Kelvin

Are you confident that /homie/kli/clang-install/lib is the same on all of the nodes used by the MPI program?
And that it contains the same version of libomp.so everywhere?

Perhaps you should also set an envirable to have the OpenMP runtime print its version, something like this
$ KMP_VERSION=1 ./a.out

LLVM OMP version: 5.0.20140926
LLVM OMP library type: performance
LLVM OMP link type: dynamic
LLVM OMP build time: no_timestamp
LLVM OMP build compiler: Clang 12.0
LLVM OMP alternative compiler support: yes
LLVM OMP API version: 5.0 (201611)
LLVM OMP dynamic error checking: no
LLVM OMP thread affinity support: no

I don’t think that is the case. There is only one task “-np 1” on one node. Both ‘./a.out’ and ‘mpirun -np 1 ./a.out’ are issued on the same node which has the same library in /home/kli/clang-install/lib. That is puzzling me!

Kelvin

I don’t think that is the case. There is only one task “-np 1” on one node. Both ‘./a.out’ and ‘mpirun -np 1 ./a.out’ are issued on the same node which has the same library in /home/kli/clang-install/lib. That is puzzling me!

It really looks as if you’re getting two different versions of the runtime, though, so having the runtime tell you its properties is still likely useful.If nothing else, it may show up that you’re not propagating envirables as you might have hoped (if the MPI version doesn’t print anything !)

– Jim
James Cownie <jcownie@gmail.com>
Mob: +44 780 637 7146

Hi Jim,

Here is what I get with KMP_VERSION=1.

$ LD_LIBRARY_PATH=$HOME/clang-install/lib KMP_VERSION=1 ./a.out
LLVM OMP version: 5.0.20140926
LLVM OMP library type: performance
LLVM OMP link type: dynamic
LLVM OMP build time: no_timestamp
LLVM OMP build compiler: Clang 11.0
LLVM OMP alternative compiler support: yes
LLVM OMP API version: 5.0 (201611)
LLVM OMP dynamic error checking: no
LLVM OMP plain barrier branch bits: gather=2, release=2
LLVM OMP forkjoin barrier branch bits: gather=2, release=2
LLVM OMP reduction barrier branch bits: gather=1, release=1
LLVM OMP plain barrier pattern: gather=hyper, release=hyper
LLVM OMP forkjoin barrier pattern: gather=hyper, release=hyper
LLVM OMP reduction barrier pattern: gather=hyper, release=hyper
LLVM OMP lock type: run time selectable
LLVM OMP thread affinity support: not used
0
1
3
2

For the mpirun case,

$ KMP_VERSION=1 mpirun -np 1 ./a.out
./a.out: symbol lookup error: /home/kli/clang-install/lib/libomp.so: undefined symbol: ompt_start_tool

I figure out how to make it work. I need to preload libarcher.so. I don’t understand why it cannot be done automatically in the “mpirun … ./a.out” case.

$ LD_PRELOAD=/home/kli/clang-install/lib/libarcher.so LD_LIBRARY_PATH=/home/kli/clang-install/lib KMP_VERSION=1 mpirun -np 1 ./a.out
LLVM OMP version: 5.0.20140926
LLVM OMP library type: performance
LLVM OMP link type: dynamic
LLVM OMP build time: no_timestamp
LLVM OMP build compiler: Clang 11.0
LLVM OMP alternative compiler support: yes
LLVM OMP API version: 5.0 (201611)
LLVM OMP dynamic error checking: no
LLVM OMP plain barrier branch bits: gather=2, release=2
LLVM OMP forkjoin barrier branch bits: gather=2, release=2
LLVM OMP reduction barrier branch bits: gather=1, release=1
LLVM OMP plain barrier pattern: gather=hyper, release=hyper
LLVM OMP forkjoin barrier pattern: gather=hyper, release=hyper
LLVM OMP reduction barrier pattern: gather=hyper, release=hyper
LLVM OMP lock type: run time selectable
LLVM OMP thread affinity support: not used
0
1
2
3

Kelvin

Hi Kelvin,

while this LD_PRELOAD is a workaround for the symptom (as it happen to
also implement ompt_start_tool), it does not explain your issue.

For my local installation I get:

$ readelf --syms libomp.so | grep ompt_start_tool
   658: 0000000000098050 54 FUNC WEAK DEFAULT 12
ompt_start_tool@@VERSION
   776: 00000000000c8f40 8 OBJECT LOCAL DEFAULT 26
_ZL22ompt_start_tool_resu
  2821: 0000000000098050 54 FUNC WEAK DEFAULT 12 ompt_start_tool

So, the runtime has a (weak) implementation of this function.

My suspicion is that mpirun adds some path to LD_LIBRARY_PATH, so that a
different libomp is loaded. You might compare

$ ldd a.out
and
$ LD_LIBRARY_PATH=/home/kli/clang-install/lib ldd a.out
and
$ LD_LIBRARY_PATH=/home/kli/clang-install/lib mpirun -np 1 ldd a.out

Instead of preloading libarcher, you can also preload a specific OpenMP
runtime to be used for execution with:

$ LD_PRELOAD=/home/kli/clang-install/lib/libomp.so

Is this the same runtime used, when you execute without mpirun?
Do you get the same error, when preloading this runtime without mpirun?

Best
Joachim

Hi Joachim,

Thanks. I still think that both “./a.out” and “mpirun -np 1 ./a.out” use the same library. Here are the ldd output.

$ ldd a.out
linux-vdso64.so.1 (0x00007fffa4810000)
libomp.so => /home/kli/clang-install/lib/libomp.so (0x00007fffa46b0000)
libpthread.so.0 => /lib64/power9/libpthread.so.0 (0x00007fffa4650000)
libc.so.6 => /lib64/power9/libc.so.6 (0x00007fffa4440000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fffa4410000)
/lib64/ld64.so.2 (0x00007fffa4830000)

$ LD_LIBRARY_PATH=/home/kli/clang-install/lib ldd a.out
linux-vdso64.so.1 (0x00007fffb54d0000)
libomp.so => /home/kli/clang-install/lib/libomp.so (0x00007fffb5370000)
libpthread.so.0 => /lib64/power9/libpthread.so.0 (0x00007fffb5310000)
libc.so.6 => /lib64/power9/libc.so.6 (0x00007fffb5100000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fffb50d0000)
/lib64/ld64.so.2 (0x00007fffb54f0000)

$ LD_LIBRARY_PATH=/home/kli/clang-install/lib mpirun -np 1 ldd a.out
linux-vdso64.so.1 (0x0000200000050000)
…/spectrum_mpi/latest/container/…/lib/libpami_cudahook.so (0x0000200000070000)
libomp.so => /home/kli/clang-install/lib/libomp.so (0x00002000000a0000)
libpthread.so.0 => /lib64/power9/libpthread.so.0 (0x0000200000210000)
libc.so.6 => /lib64/power9/libc.so.6 (0x0000200000260000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000200000470000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00002000004a0000)
libm.so.6 => /lib64/power9/libm.so.6 (0x00002000006d0000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000200000800000)
/lib64/ld64.so.2 (0x0000200000000000)

Preloading the libomp.so does not seem to help.

$ LD_PRELOAD=/home/kli/clang-install/lib/libomp.so mpirun -np 1 ./a.out
./a.out: symbol lookup error: /home/kli/clang-install/lib/libomp.so: undefined symbol: ompt_start_tool

Hi Kelvin,

I tried both commands with LD_DEBUG. It seems that somehow the
libarcher.so cannot be found to resolve ompt_start_tool.

libomp calls ompt_start_tool directly (by name) in

In a typical execution, this will find the implementation in libomp:

If I did not miss something, implementation and call should be in the
same ifdef branch, i.e., both active or not.

Reasoning: This explicit call by name is necessary to catch the case of
a static tool compiled in the application (linker will prefer
ompt_start_tool from the static tool). Such static version might not be
found by dlsym.

Later, libomp implicitly assumes "libarcher to be the last entry in
OMPT_TOOL_LIBRARIES", dlopens libarcher and dlsyms ompt_start_tool:

Therefore you see libarcher in the LD_DEBUG output.

This is how it looks for me:
$ LD_DEBUG=bindings ./a.out 2>&1| grep ompt_start_tool
      1568: binding file libomp.so [0] to libomp.so [0]: normal symbol
`ompt_start_tool' [VERSION]
      1568: binding file libarcher.so [0] to libarcher.so [0]: normal
symbol `ompt_start_tool'

$ LD_DEBUG=bindings LD_LIBRARY_PATH=/home/kli/clang-install/lib ./a.out
2>&1| grep ompt_start_tool
      9105: binding file /home/kli/clang-install/lib/libomp.so [0] to
/home/kli/clang-install/lib/libomp.so [0]: normal symbol `ompt_start_tool'
[VERSION]
      9105: /home/kli/clang-install/lib/libomp.so: error: symbol
lookup error: undefined symbol: ompt_start_tool (fatal)

Here the execution complains about the missing symbol, but does not
abort. It is unclear to me, why the execution does not abort in this case.

      9105: binding file /home/kli/clang-install/lib/libarcher.so [0]
to /home/kli/clang-install/lib/libarcher.so [0]: normal symbol
`ompt_start_tool'

$ LD_DEBUG=bindings LD_LIBRARY_PATH=/home/kli/clang-install/lib mpirun -np
1 ./a.out 2>&1| grep ompt_start_tool
     27652: binding file /home/kli/clang-install/lib/libomp.so [0] to
/home/kli/clang-install/lib/libomp.so [0]: normal symbol `ompt_start_tool'
[VERSION]
     27652: /home/kli/clang-install/lib/libomp.so: error: symbol
lookup error: undefined symbol: ompt_start_tool (fatal)

Same message as above, but now the execution aborts:

./a.out: symbol lookup error: /home/kli/clang-install/lib/libomp.so:
undefined symbol: ompt_start_tool

Since the dlopen for libarcher is after the explicit call to
ompt_start_tool, it is clear that the message about libarcher is missing
here.

Does your libomp contain an implementation of ompt_start_tool?

nm /home/kli/clang-install/lib/libomp.so | grep ompt_start_tool
or
readelf --syms /home/kli/clang-install/lib/libomp.so | grep ompt_start_tool

- Joachim

Hi Joachim,

Does your libomp contain an implementation of ompt_start_tool?

It seems to be there.

$ readelf --syms /home/kli/clang-install/lib/libomp.so | grep ompt_start_tool
323: 000000000011bb40 188 FUNC WEAK DEFAULT 11 ompt_start_tool@@VERSION [: 8]
166: 000000000015f1b0 8 OBJECT LOCAL DEFAULT 25 _ZL22ompt_start_tool_resu
3457: 000000000011bb40 188 FUNC WEAK DEFAULT 11 ompt_start_tool [: 8]

Kelvin