__builtin_* vs llvm-libc provided ones?

There is e.g. __builtin_memcpy and even a __builtin_memcpy_inline in clang, and __builtin_memset_inline apparently is just a TODO right now.

As __builtin_memcpy for example already has an implementation on llvm but llvm-libc hasn’t reused it and implemented it separately, maybe perhaps in order to not make llvm-libc a clang buildable only project as maybe some builtins aren’t present in other compilers.

My question is what is the plan about the two, accepting the duplication of implementations or removing non libc implementation in favor of llvm libc provided ones or the reverse way.

I wonder if is there a future that all or most of the libc entries will also be provided by compiler’s _builtin* and maybe _builin*_inline also thus resolving the TODO __builin_memset_inline by just reusing llvm-libc implementations somehow on IRBuilder.

Thanks!

To describe in maybe some better way, for example this is all I need to get ceilf floorf etc in a .wasm module being built by -nostdlib -nostdinc

#define ceilf __builtin_ceilf
#define floorf __builtin_floorf
#define abs __builtin_abs
#define fabs __builtin_fabs

so I wondered if I could get more of libc this way (parts make sense ofc) or at least to know what will be the relation between those builtin implementations and the upcoming libc.

+Guillaume Chatelet who will be able to answer this better for the memory functions. I will shortly also give my answer for the math function examples.

IIUC, there are two parts to your question:

  1. Can we implement a libc function as a macro resolving to a builtin: Not if the standard requires the function to be a real addressable function. One can choose to also provide a macro, but an addressable function declaration should be available. See section 7.1.4 of the C11 standard for more information.
  2. What is the difference between builtins and the libc flavors of the functions: Typically, builtins resolve to the hardware instruction implementing the operation. If a hardware implementation is not available, the compiler builtin calls into the libc itself. With respect to math functions, you will notice this wilh the long double flavors. That said, we have implemented the math functions from first principles (as in, the implementations do not assume any special hardware support) in LLVM libc. However, we are just about starting to add machine specific implementations (https://reviews.llvm.org/D95850). This should make the libc functions equivalent to the compiler builtins.

Thank you so much.

With your explanation now I understand builtins propose better now I think, I was also wrong about __builtin_memcpy_inline as it isn’t as flexible as a real libc memcpy and needs its third argument to be a constant, “error: argument to ‘__builtin_memcpy_inline’ must be a constant int…” (which I wished it wasn’t the case but is understandable why it is) and now I see __builtin_memcpy is also a proxy to libc memcpy which I guess is there just to make compiler code analysis easier.

Now given that, not as a macro but is it possible to use the builtins inside llvm-libc implementation maybe so llvm-libc won’t have to implement them again? I mean do you see it possible for llvm-libc to get its implementation shared with compiler one somehow behind the scene? Maybe through some directory inside somewhere that the both compiler and llvm-libc can share their implementations of those?

The reason I’m asking is because of a hope I have to see compiler builtins some day to be more capable, which I understand I shouldn’t be that hopeful about it, but I think the questions can be thought about regardless.

Thanks!

Thank you so much.

With your explanation now I understand builtins propose better now I think, I was also wrong about __builtin_memcpy_inline as it isn’t as flexible as a real libc memcpy and needs its third argument to be a constant, “error: argument to ‘__builtin_memcpy_inline’ must be a constant int…” (which I wished it wasn’t the case but is understandable why it is) and now I see __builtin_memcpy is also a proxy to libc memcpy which I guess is there just to make compiler code analysis easier.

Now given that, not as a macro but is it possible to use the builtins inside llvm-libc implementation maybe so llvm-libc won’t have to implement them again? I mean do you see it possible for llvm-libc to get its implementation shared with compiler one somehow behind the scene? Maybe through some directory inside somewhere that the both compiler and llvm-libc can share their implementations of those?

One could make that work I guess (with a lot of caveats). But, we want to be able to build and test LLVM libc without the rest of LLVM. This is currently not possible because of the tablegen dependency but we are still searching for good alternates. In the meanwhile, we don’t want to add any more dependency from outside of the libc sources.

re: memcpy & builtins

Now given that, not as a macro but is it possible to use the builtins inside llvm-libc implementation maybe so llvm-libc won’t have to implement them again? I mean do you see it possible for llvm-libc to get its implementation shared with compiler one somehow behind the scene?

Sure enough sharing implementation is always a noble goal to pursue. Unfortunately, there are a number of things to consider that makes it hard for memory functions in general.
I’ll try to give an overview of the challenges here. I’ll start with the basics - my apologies if you already know most of it.

Most compilers use an internal representation that is well suited for abstract, relatively high level depiction of operations.

  1. When compiling C, C++, Rust - you name it - the source language is first transformed by the front end into a common representation (the so-called IR).

  2. This IR can be transformed by general - CPU agnostic - passes and progressively refined (lowered) to be closer and closer to the real underlying hardware.

  3. Finally code generation occurs (SelectionDAG legalization and optimizations, register allocation, code generation)

During 1, we can convey the memcpy semantic to the IR in different ways:

During 2, a bunch of smart optimizations may recognize IR patterns and turn them into the IR memcpy intrinsics

This behavior can be disabled by using the “-fno-builtin-memcpy” misnomer https://godbolt.org/z/7GoxPT
In addition this flag also prevents the frontend from recognizing libc memcpy function https://godbolt.org/z/dsrTrc
I know this is confusing :-/

Now the good thing with having the compiler understand memcpy semantic is that it can produce excellent code based on the context:

To sum it up, many constructs can end up being interpreted as having the memcpy semantic by LLVM and depending on the context the resulting code may differ widely.

Now it is desirable to have a C/C++ implementation of memcpy to be able to leverage optimization techniques like Profile Guided Optimization. For instance when the compiler sees the code, it can reason about it and take inlining decisions, reorder branches, etc…
The complex interactions I described earlier turns this into a chicken and egg problem where the code may end up calling itself indefinitely https://godbolt.org/z/eg0p_E

This is why __builtin_memcpy_inline has been designed in the first place (see the original thread about it https://lists.llvm.org/pipermail/llvm-dev/2019-April/131973.html).
Its contract is simpler and makes it useful as a building block for creating memcpy functions in pure C/C++.

Maybe through some directory inside somewhere that the both compiler and llvm-libc can share their implementations of those?

It may not be self evident from what I described earlier but the way memcpy is implemented in LLVM really spans a lot of different parts and I’m not sure it is possible to gather it in a single place as regular code without adding ways to communicate intents to the compiler (aka more builtins).
For instance loop creation has to take place at the IR level (Phi nodes and condition for the loop) but it may be in tension with the availability of accelerators that are particular to backend implementations (think Enhanced REP MOVSB/STOSB for x86 processors)

I’m aware that this answer is probably really confusing but I hope it helps still.

Thank you so much for the explanations, the way things are organized make more sense to me now!

Thanks!