Crash in omp_unset_lock

Hello,
I am trying to use a legacy library in a new project linked with OpenMP/LLVM.
This library performs image processing and uses massively OpenMP for acceleration. Unfortunately I have a crash in a call to omp_unset_lock in the new project runtime.

Here is the portion of the crashstack:
#0 0x00007ffff69a0428 in __kmp_release_queuing_lock.localalias.19 () from /usr/local/lib/libomp.so
#1 0x00007ffff692f44f in __kmpc_unset_lock () from /usr/local/lib/libomp.so
#2 0x00007ffff69c163d in omp_unset_lock@OMP_3.0 () from /usr/local/lib/libomp.so

I replaced the omp_set_lock and omp_unset_lock by some #pragma omp critical as a workaround and it seems to work fine.

However I would like to know if it is an unsupported feature of OpenMP/LLVM. I could not find the information here: OpenMP Support — Clang 16.0.0git documentation. Maybe I missed it.

We also tried the call to omp_set_lock then omp_set_unlock in a unit test without this legacy library, and it also crashed.

Thank you for your insights.
Best regards

Alyson Roger

If possible, can you please file a GitHub issue and provide a reproducer?

Sure no problem. Can you send me the link to the right Github repository?
Thank you.

You can report the issue here Issues · llvm/llvm-project · GitHub.

1 Like

Hello. I tried to create a small binary to reproduce the issue but it is not crashing as in my project. Even if the kind of operations performed inside the lock/unlock calls are similar. I tried to link to the same dependencies without any success.
I am afraid that it has to do with some more complex hazardous state in the project that has a heavy codebase compared to the small binary.
Sorry, I am not sure that I can provide an easy reproducer. Maybe it would be a waste of time for you if I post it in the github.
Do you have a suggestion?

Thanks for your support

Hmm, that’s gonna be pretty tough to debug. It’s pretty hard to try to fix something if I have no idea where it is broken. :joy:

It really feels as if this is some problem in the user code, which is corrupting one of the OpenMP runtime’s data structures.
After all, it is not as if the OpenMP lock interface is new and not heavily used.

One possibility (given the continual emphasis on the complexity of the test code and its use of multiple libraries) is that one of the libraries is statically linked with another copy of the OpenMP runtime, and that the code is trying to share a lock between those two library instances, initialising it in one library and finalising it in another. Or, maybe the initialisation and destruction are mismatched, using init_nest_lock and destroy_lock (or the other way around).

Finally, of course, it could simply be memory corruption by the application breaking the OpenMP lock data structure…

I very much doubt that the problem is actually in the OpenMP runtime code.

Hello,
Thank you JCownie for your reply. That could explain why I cannot reproduce with a simple binary. I checked the dynamic links of the binary and its symbols with nm and I see omp in both cases.

With nm I find those symbols:
U omp_get_max_threads
U omp_get_num_procs
U omp_get_thread_num
U omp_init_lock
U omp_set_lock
U omp_set_num_threads
U omp_unset_lock

And the ldd command returned me those links
libomp.so => /usr/local/lib/libomp.so (0x00007f73f1e99000)
libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007f73ed22b000)

Does it match your theory?
On my side I will try removing the static link to omp to see if it confirms it.

I will keep you informed :slight_smile:

And the ldd command returned me those links
libomp.so => /usr/local/lib/libomp.so (0x00007f73f1e99000)
libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007f73ed22b000)

Does it match your theory?

Yup, you have both libomp and libgomp in there, which is a very bad idea. (If nothing else you likely have 2x over-subscription of threads).

It is possible that what is happening is something like this

  1. GCC compiled code calls omp_init() loading libgomp
  2. The GCC compiled code allocates and initialises a lock
  3. The GCC compiled code calls a function in an LLVM compiled library
  4. That LLVM compiled code calls an OpenMP function, causing libomp to be loaded
  5. The LLVM compiled code returns to the GCC compiled code
  6. The GCC compiled code calls omp_unset_lock for the first time, so the dynamic lnker (using lazy symbol resolution) resolves it, but to the most recently loaded library (libomp).
  7. Now libomp is called with a lock which had been initialised by libgomp BANG

You may be able to fix it by only loading libomp, since it provides the GCC libgomp entrypoints.
If you create a symbolic link so that libomp.so also appears as libgomp.so.1 that might work, though it you’re also linking an OpenMP runtime in statically it could still break.

You definitely want only one OpenMP runtime linked into the process.