LLJIT: __{math}_finite symbols not resolved ?

Hello,
when building code with -Ofast -ffinite-math-only -ffast-math, clang generates calls to “finite” variants of math functions.

This has been the source of a fair amount of issues in a “normal”, non-JIT pipeline, which seem to have been fixed over time - a simple fix being recompiling the target app against the new glibc.

But when going through LLJIT (tested with LLVM-10 & LLVM-11, on ArchLinux, glibc-2.32) I still get

Symbols not found: [ __log_finite, __exp2_finite ]

when trying to materialize my code.

What could be done for that ? “Recompiling” doesn’t seem to fix anything in this case so it looks like LLJIT lacks the mechanism to understand the ELF symbol indirection.

Thanks,
Jean-Michaël

Hi Jean-Michaël,

How are you trying to provide those symbols to the JIT? Are you using a DynamicLibrarySearchGenerator to reflect process symbols (or this specific library’s symbols) into the JIT?

I haven’t looked at ELF symbol indirection before – I’ll need to read up on that before I can provide a sensible answer. It’s quite likely that RuntimeDyld doesn’t support it yet though. Depending on what is required we can either try to implement it there, or aim to fix it in the newer JITLink linker – a few people are working on an initial implementation of that at the moment.

– Lang.

Hello,

Right now I am just using a Generator to look for symbols in my process (which links dynamically against libc / libm).

It seems to have no trouble finding every other libc / libm / libc++ / … symbol so I assumed that it was not necessary to specifically link against libm where these __finite symbols reside:

$ nm -D /usr/lib/libm.so.6 | grep finite
0000000000050540 T __acosf128_finite@GLIBC_2.26
0000000000042f70 T __acosf_finite@GLIBC_2.15
0000000000026940 i __acos_finite@GLIBC_2.15
0000000000051000 T __acoshf128_finite@GLIBC_2.26
0000000000043240 T __acoshf_finite@GLIBC_2.15

Hi Jean-Michaël,

Ok – if you’re linking against other symbols without issue then your setup sounds good.

My first take is that if you’re set up correctly then this should “just work”, and this failure should be considered a bug, but I need to understand more about ELF indirect / versioned symbols before I can say that definitively. I usually develop on MacOS, but I’ll set up a VM and see if I can reproduce this locally to get some more insight here.

In the meantime one workaround would be to define absoluteSymbol entries for these functions:

auto Err = J->getMainJITDylib().define(
absoluteSymbols({

{ J->mangleAndIntern(“__log_finite”), pointerToJITTargetAddress(&__log_finite) },
{ J->mangleAndIntern(“__exp2_finite”), pointerToJITTargetAddress(&__exp2_finite) }

}));

– Lang.

Hello,
here is a repro which runs in a docker image.
https://we.tl/t-O1EhIAOeOF

To see the issue, run repro.sh
It will first download a (big, sorry) centos:7 docker image with my build of LLVM-11 and build a simple lljit-based example.

This example is called with some trivial .cpp which calls cos.
When ran from within the container it works.
When the same example, with the same bitcode input, runs from outside the container, it does not find this symbol,

likely because the host (in my case Arch, I think you need a glibc-2.31 at least for that behaviour to be visible)'s glibc symbols

became versioned.

Removing either the -fmath-errno or -ffinite-math-only flag for the clang cpp → bitcode invocation in build.sh fixes the issue
(at the expense of potentially slower code).

Thanks for the hint, sadly it’s not possible to take the address of __log_finite : what happens is that you call the function e.g. log()

in your code, and either clang or some magic glibc header transforms that into __log_finite further down the pipeline

(see e.g. the discussion in https://reviews.llvm.org/D74712 - sadly in my case I can’t “upgrade” the headers used by my JIT SDK to glibc-2.31+
as it would mean that only people with very very recent distros would be able to run the code that’s being jit-compiled.

Thanks !

Jean-Michaël

Hi Jean-Michaël,

Thanks very much for the reproduction case. I’ll try this out tomorrow.

If you can’t take the address of ___finite, then what about defining your own _wrapper functions and using their addresses when defining the absolute symbols? I’m not sure what the performance implications would be for your use-case though.

– Lang.

Hi Jean-Michaël,

Sorry for the delayed reply – The dev meeting kept me pretty busy the last couple of days.

When I run your repro.sh script (thank you for taking the time to add all the container config code by the way, that’s very helpful!) I see:

% ./repro.sh
– Starting docker container –

Using default tag: latest
latest: Pulling from ossia/score-package-linux
Digest: sha256:aca4c255d4d5a6926e5cd4f50a1a57f6e262a3c931efaaf94a62066784e5424c
Status: Image is up to date for ossia/score-package-linux:latest
docker.io/ossia/score-package-linux:latest
– Compiling example to bc –

– Building –

– The CXX compiler identification is Clang 11.0.0
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Check for working CXX compiler: /opt/score-sdk/llvm/bin/clang++ - skipped
– Detecting CXX compile features
– Detecting CXX compile features - done
– Found LLVM 11.0.0
– Using LLVMConfig.cmake in: /opt/score-sdk/llvm/lib/cmake/llvm
– Configuring done
– Generating done
– Build files have been written to: /repro/build
Scanning dependencies of target repro_ffastmath_llvm
[ 50%] Building CXX object CMakeFiles/repro_ffastmath_llvm.dir/main.cpp.o
[100%] Linking CXX executable repro_ffastmath_llvm
[100%] Built target repro_ffastmath_llvm
– Running from within the container : ok

/repro/build.sh: line 48: 45 Illegal instruction ./repro_ffastmath_llvm
– Leaving container –

– Running from within the host system fails:

./repro.sh: line 10: ./repro_ffastmath_llvm: cannot execute binary file

Hi Jean-Michaël,

Sorry – I misread your email earlier:

When ran from within the container it works.

Ahh – I should be looking for success here. I see why the failure is happening: The testcase doesn’t check errors or expected values. You can’t ignore those: They have embedded calls to abort in their destructor if you ignore them. For the purpose of writing minimal tests you can always wrap your calls in ‘cantFail’. E.g.:

auto JIT_e = cantFail(llvm::orc::LLJITBuilder().create());

That will strip Expected / Error return types (to T/void), asserting that the value is success in each case.

Once I make those changes I’m seeing the test pass in my Arch Linux container. Could you share the bitcode that is failing for you? That will help me pin down where things are going off the rails (or failing to go off the rails) with my setup.

– Lang.