Implement foo in terms of __builtin_foo fails

Hello clang devs,

I’m implementing libm for a toolchain based on LLVM. There is instruction level support for some libm functions that I’d like to target via clang. The IR to machine code part is working so I think the next step is writing C that generates calls to the llvm intrinsics.

For example:

double sin(double x) { return __builtin_sin(x); }

I expected this to compile to:
define double @sin(double %x) {
%res = call double @llvm.sin.f64(double %x)

ret double %res

Our back end recognises llvm.sin.f64 so this would compile into sensible instructions.

However, at -O0 clang generates a call to sin.

define double @sin(double %x) {
%x.addr = alloca double, align 8
store double %x, double* %x.addr, align 8
%0 = load double, double* %x.addr, align 8
%call = call double @sin(double %0)
ret double %call

At higher optimisations, this recursive call is detected and optimised into:
define double @sin(double %x) {
ret double undef

How can I write C that generates a call to the llvm.sin.f64 intrinsic?



Why do you want to force that? The library handling already does it when
the compilation flags match them up. I.e. most of the intrinsics are a
lot more restricted than the ISO C constraints.


I think this means I’m missing some compilation flags.

I’m not passing -ffast-math because treating floating point as associative etc is unattractive, but I’m willing to violate iso c handling of errorno. fast-math and denormal-fp-math are the only two flags I can find in the documentation. ffreestanding and fno-builtin don’t appear to change the example. Where are the controlling flags listed?

I can now present this problem more clearly. I was missing flags fno-math-errno, fno-trapping-math, but clang is also missing some functionality.

I think that various clang builtins have the same semantics as llvm builtins with both of these math flags set, e.g. sin, exp. There’s special handling for sqrt (including returning undef on negative inputs with no-nans-fp-math) in CGBuiltin, but most of libm is emitted as library calls.

SelectionDAGBuilder then matches some of the library calls and emits ISD nodes for them, e.g. it supports exp2 but misses exp. This is too late for implementing libm but otherwise OK.

I would like to add handling to CGBuiltin to lower more of the libm derived clang intrinsics to llvm intrinsics when appropriate fpmath flags are set.

AMDGPU duplicates the libm derived builtins, lowering them via SelectionDAG. This would also work for me but it seems a shame to ignore llvm.sin.f32 et al when they already exist. I can’t find a way to target them from C without changing clang.

What does the list think of extending CGBuiltin vs adding more target specific nodes?



Depending on the amount of control you have over your toolchain, you can
disable errno-reporting completely. I consider that a historic mistake
in general and the standard quite allows that. -fno-trapping-math should
not be necessary in general, what do you need it for?

The other thing to keep in mind is that LLVM is still quite limited in
what it can do for the intrinsics beyond lowering to instructions. Given
that few CPUs have decent transcendental FP support, that isn't that
high on the priority list for most. I.e. the range reduction on x86 is
completely broken for the trigonometric functions etc.