Hello everybody,
When I compile the simple test program attached below I get the following error message:
openmpbug.cc:(.text+0x2de): undefined reference to `__sync_val_compare_and_swap_16'
openmpbug.cc:(.text+0x3a7): undefined reference to `__sync_val_compare_and_swap_16'
clang-3.8: error: linker command failed with exit code 1 (use -v to see invocation)
This is caused by the OMP max-reduction statement. It also fails if I switch from "max" to "min".
I use version 3.8 of clang on an Ubuntu 14.04 LTS system. It also fails with version 3.9 (trunk, self-compiled).
If I switch the floating point data type (real_t) from "long double" to "double" or "float", the code compiles and runs without problems.
I hope this helps to make clang+openmp even better!
Here comes the simple test program:
---------------------------------------SNIP-------------------------------------------
#include <iostream>
using namespace std;
//typedef float real_t;
//typedef double real_t;
typedef long double real_t;
int
main()
{
real_t maxval = -1.0e-10;
#pragma omp parallel for reduction(max: maxval)
for (int i = 1; i <= 1000; i++) maxval = max(maxval, real_t(i));
cout << maxval << endl;
return 0;
}
------------------------------------SNIP------------------------------------------------
This looks like an inconsistency in our knowledge of LLVM/Clang.
Historically LLVM did not support 16 byte floating point, so the OpenMP runtime does not have support for them when compiled with clang (because the necessary routines couldn't be compiled!).
If your code really is using 16 byte floating point numbers, then the tests in the runtime that don't compile those routines need to be enabled.
It'd be worth checking, though, that the "long double" really is 16 bytes, so maybe you could just check sizeof(real_t).
p.s. You don't need to initialize max_val; whatever value you put there will be ignored anyway, since the OpenMP standard says that the per-thread reduction values are initialized with the most negative value of the type.
-- Jim
James Cownie <james.h.cownie@intel.com>
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
Tel: +44 117 9071438
Thanks for the reply,
in fact "long double" is the x86 extended precision which needs 80 bits (10 bytes) only. It is not a full 128 bit floating point data type (as the quad precision provided by gcc and the Intel compiler).
Nevertheless, sizeof(long double) gives a value of 16 (due to alignment reasons, I assume); 6 bytes are unused.
-- Stefan
This looks like an inconsistency in our knowledge of LLVM/Clang.
Historically LLVM did not support 16 byte floating point, so the OpenMP runtime does not have support for them when compiled with clang (because the necessary routines couldn't be compiled!).
If your code really is using 16 byte floating point numbers, then the tests in the runtime that don't compile those routines need to be enabled.
It'd be worth checking, though, that the "long double" really is 16 bytes, so maybe you could just check sizeof(real_t).
p.s. You don't need to initialize max_val; whatever value you put there will be ignored anyway, since the OpenMP standard says that the per-thread reduction values are initialized with the most negative value of the type.
Thanks for the hint - I was not aware of that!
Hi, could please send llvm ir for your example?
you can do it like this: clang -c -S -emit-llvm -o openmpbug.ll openmpbug.cc
and attach openmpbug.ll file.
Best regards,
Alexey Bataev
Thanks, I will look at it ASAP
Best regards,
Alexey Bataev
Could you do it one more time but with -fopenmp option added? Your LLVM
IR does not have any OpenMP code. Seems to me you compiled it wihtout
-fopenmp
Best regards,
Alexey Bataev
Ok,
Now I see! LLVM backend lowers IR 'cmpxchg i128*' instruction to
'__sync_val_compare_and_swap_16' call. So, the problem is in lowering
part of LLVM backend, not clang.
Actually, this problem may be reproduced by some operations on atomic types
Best regards,
Alexey Bataev
Yes, though it has width 80 bits, we’re using 128 bit to perform atomic operations on this data (alignment allows us to do that).
Calling a 16 byte compare-and-swap seems OK, provided that the appropriate runtime contains that function!
(So, maybe the problem is not the lowering itself, but the way the support library is being built.
X86_64 has a cmpxchg16B instruction so providing that function should be feasible.)
In any case, it's clearly not an OpenMP specific issue...
-- Jim
James Cownie <james.h.cownie@intel.com>
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
Tel: +44 117 9071438