[RFC] llvm-libc tuning

Context

llvm-libc is intended to support a wide range of scenarios from embedded devices where resources are scarce to demanding applications running on high-end servers where performance is paramount. The philosophy behind llvm-libc is that you can tailor it to your needs.

This document explores how the tuning can be implemented, what are the pitfalls, and how to mitigate them.
This post is really to get the conversation started around the topic of “tuning llvm libc”. Please don’t hesitate to share your thoughts / advice as this will help shape our guidelines.

1. Build Systems

llvm-libc currently supports building via CMake and Bazel. Customizing the build involves passing specific flags to the build system command line.

CMake

For CMake this is done by using the -D command line option. Project maintainers can add custom options with the option keyword. Some options prefixed by CMAKE_ have a special meaning and are targeted to CMake itself (e.g. -DCMAKE_BUILD_TYPE=Release).

To communicate these values down to the compiler, the maintainer can either explicitly add preprocessor definitions (target_compile_definitions) or auto-generate a header file containing the options they want to export (configure_file).

Bazel

The user can specify custom command line options (e.g., --@llvm-project//libc:mpfr=system). The presence of these flags can activate config_settings that are in turn used to modify the compilation process. The select keyword can be used to customize the target and change compilation flags, preprocessor definitions, target dependencies, etc…

2. The build options themselves

2.1 Declaration and naming

Build options have to be declared in the build configuration file.

The options should be scoped and named consistently. The relevant scope will usually be the C function name (e.g., printf) but if a feature is shared across a range of functions the scope can be a name precisely identifying this group (e.g. SINE_COSINE or MATH_TRIG)

  • For CMake, feature configuration would be done by using the following template LIBC_<FUNCTION>__<FEATURE>__<VERB>
    e.g. LIBC_PRINTF__INDEX_MODE__DISABLE

  • For Bazel a similar scheme can be used.
    e.g., --@llvm-project//libc:printf__index_mode=disable

2.2 Orthogonality

Enabling a feature should not disable another one or lead to surprising behaviors. If two features are incompatible with each other, the compilation should stop early with a clear error.

3. The Tuning

Tuning itself can be done at the build system level or within the compiler at the source code level.

3.1 At build system level

Based on user provided options, the build system can modify the compiler command line option to:

  • change implementations by picking another source file,
  • enable / disable target specific features (e.g. -mno-fma disables support for FMA),
  • change optimization profile (for speed, for size),
  • add preprocessor definition and delegate customization to the compiler (See 3.2 below).

This is powerful but difficult to keep in sync between CMake and Bazel.

3.2 At source code level via the preprocessor

In section 2 above we focused on build system options, in this section we are talking about compiler preprocessor definitions. Although CMake command line uses the -D syntax to set build options these are not to be confused with clang -D syntax.

To prevent conflating the two we suggest that preprocessor definitions start with LLVMLIBC_ instead of LIBC_.


A note on interactions between compiler flags and preprocessor

If some preprocessor definitions are provided by the build system, other ones are set because of compiler options.

For instance, if the build system adds -fma on the clang command line, clang will automatically define the __FMA__ preprocessor definition. Similarly, using the -DCMAKE_BUILD_TYPE=MinSizeRel CMake option, will append the -Os flag that, in turn, will define the __OPTIMIZE_SIZE__ preprocessor definition.

We can arrange these compiler generated preprocessor definitions in two categories:

  1. The semantic of the flag precisely matches the semantic for llvm-libc (e.g. __FMA__ means “target cpu supports fma instructions”)
  2. The semantic of the flag is imprecise or does not fully represent the intent of the llvm-libc option (e.g. __OPTIMIZE_SIZE__ does not discriminate between scenarios like “optimize for size at all cost” and “optimize for cost with reasonable speed”).

If the semantic is perfectly represented by the compiler generated preprocessor definition we can use it to perform conditional compilation. If not, the build system is responsible for setting additional preprocessor definitions with precise meaning and the conditional compilation should use these instead of the imprecise one.


A note on pitfalls of preprocessor

Conditional compilation based on preprocessor definitions is quite standard

#ifdef LLVMLIBC_ABC
// This branch is compiled when LLVMLIBC_ABC is defined.
#else
// This branch is compiled when LLVMLIBC_ABC is undefined.
#endif

Unfortunately this can be brittle and hard to maintain. For instance, forgetting to rename one instance when refactoring or making a typo can lead to the wrong branch being compiled

// LLVMLIBC_ABC renamed to LLVMLIBC_XXX in the codebase but this instance was forgotten.
#ifdef LLVMLIBC_ABC
// This branch is not compiled anymore as LLVMLIBC_ABC is undefined.
#else
// This code is compiled instead and may compile fine.
#endif

Also, as far as the preprocessor is concerned, an undefined preprocessor definition compares equal to "0" (code).
e.g., The following code compiles just fine but the behavior is unexpected.

#if LLVMLIBC_FOO==0
// Taken even if LLVMLIBC_FOO is not set on the compiler's command line.
#elif LLVMLIBC_FOO==1
// Not taken as expected.
#endif

The fact that the preprocessor runs before compilation makes it difficult to mitigate these problems.

  • Solution 1: Mitigating with preprocessor only

Preprocessor definition checking within the preprocessor is quite verbose and really adds visual clutter.

#if defined(LLVMLIBC_PRINTF_DISABLE_INDEX_MODE) && (LLVMLIBC_PRINTF_DISABLE_INDEX_MODE == 0)
// LLVMLIBC_PRINTF_DISABLE_INDEX_MODE is defined to 0
#elif defined(LLVMLIBC_PRINTF_DISABLE_INDEX_MODE) && (LLVMLIBC_PRINTF_DISABLE_INDEX_MODE == 1)
// LLVMLIBC_PRINTF_DISABLE_INDEX_MODE is defined to 1
#else
// LLVMLIBC_PRINTF_DISABLE_INDEX_MODE is undefined
#endif
  • Solution 2: Mitigating with preprocessor only, two-steps solution
#ifndef LLVMLIBC_PRINTF_DISABLE_INDEX_MODE
#error "LLVMLIBC_PRINTF_DISABLE_INDEX_MODE is undefined"
#endif
#if LLVMLIBC_PRINTF_DISABLE_INDEX_MODE == 0
// LLVMLIBC_PRINTF_DISABLE_INDEX_MODE is defined to 0
#elif LLVMLIBC_PRINTF_DISABLE_INDEX_MODE == 1
// LLVMLIBC_PRINTF_DISABLE_INDEX_MODE is defined to 1
#else
// LLVMLIBC_PRINTF_DISABLE_INDEX_MODE is undefined
#endif
  • Solution 3: Mitigating with a mix of preprocessor and C++

We can define a consteval function to check that a string is exactly "0" or "1" and then evaluate the stringize version of the preprocessor definition (code).

LIBC_VALIDATE_BOOL_ENV(LLVMLIBC_PRINTF_DISABLE_INDEX_MODE);
#if LLVMLIBC_PRINTF_DISABLE_INDEX_MODE
// LLVMLIBC_PRINTF_DISABLE_INDEX_MODE is defined to 1
#else
// LLVMLIBC_PRINTF_DISABLE_INDEX_MODE is defined to 0
#endif

Same for an integer preprocessor definition

LIBC_VALIDATE_UINT_ENV(LLVMLIBC_FILE_BUFFER_SIZE);

Note: there is still the possibility of a typo on the #if line but we expect that modern editor’s syntax highlighting will help catch such errors. e.g.,

LIBC_VALIDATE_BOOL_ENV(ABC);
#if ABCD
...

This is the preferred solution as the check stays next to the usage. It also reduces visual clutter to a minimum. The only caveat here is that it requires C++20.


3.2.1 Setting constants

A preprocessor definition can be used to define constant libc quantities like thread stack size or file buffer size. e.g.,

clang -DLLVMLIBC_THREAD__STACK_SIZE=4096

The developer should use the LIBC_VALIDATE_UINT_ENV macro to make sure that the preprocessor definition is set and valid before using it.

3.2.2 Conditional code

The intent here is to allow some features to be disabled or some implementations to be replaced by alternatives by using the preprocessor directives (#if, #else, #endif, etc… ). This is best done by using boolean preprocessor definitions.

The developer should use the LIBC_VALIDATE_BOOL_ENV macro to make sure that the preprocessor definition is set and valid.

3.2.3 Conditional file inclusion

There are several ways of performing conditional file inclusion using a combination of build system and preprocessor directives.

Is it unclear which stands out so we list them here in arbitrary order with their pros and cons.

  • Selection with the build system

    The build system language is leveraged to select the file to compile for a certain set of constraints. It seems like the appropriate approach for completely different implementations where there are no obvious customization points and a straightforward selection logic.
    e.g. Pick between generic/sqrt.cpp or x86_64/sqrt.cpp or aarch64/sqrt.cpp depending on target architecture.

    Pros

    • No #if, #else, #endif in the source file
    • The dispatch logic can use high level build options that are not visible at source code level.

    Cons

    • The same selection logic should be replicated and kept in sync between the build systems.
    • The selection logic is not visible in the source code, this raises the bar for contributors and maintainers who would also need to understand both build systems.
  • Selection with conditional include

    Here the preprocessor will pull the relevant code for the compiler based on preprocessor definitions.

    LIBC_VALIDATE_BOOL_ENV(LLVMLIBC_MATH_TRIG__SMALL_AND_IMPRECISE)
    #if LLVMLIBC_MATH_TRIG__SMALL_AND_IMPRECISE
    #include "src/math/trig_precise.inl"
    #else
    #include "src/math/trig_imprecise.inl"
    #endif
    

    Pros

    • The selection logic is visible in the source code.

    Cons

    • For build hermeticity, the build system may need to know about the file dependencies, if so the logic will have to be replicated in the build system.
  • Selection with conditional code

    Another possibility is to include all alternatives in the main source file and do the selection using the preprocessor at the implementation site.

    In the main source file, all alternatives are listed.

    #include "src/math/trig_imprecise.inl"
    #include "src/math/trig_precise.inl"
    

    In the implementation file the content is selectively enabled / disabled

    Content of src/math/trig_imprecise.inl

    #include "src/__support/common.h" // LIBC_VALIDATE_BOOL_ENV
    LIBC_VALIDATE_BOOL_ENV(LLVMLIBC_MATH_TRIG__SMALL_AND_IMPRECISE)
    #if LLVMLIBC_MATH_TRIG__SMALL_AND_IMPRECISE==1
    // Implementation of small but imprecise trigonometric functions goes here
    #endif
    

    Content of src/math/trig_precise.inl

    #include "src/__support/common.h" // LIBC_VALIDATE_BOOL_ENV
    LIBC_VALIDATE_BOOL_ENV(LLVMLIBC_MATH_TRIG__SMALL_AND_IMPRECISE)
    #if LLVMLIBC_MATH_TRIG__SMALL_AND_IMPRECISE==0
    // Implementation of correct trigonometric functions goes here
    #endif
    

    Pros

    • All logic is in the code and all implementations are listed in the build system regardless of the selection mechanism.

    Cons

    • The selection logic is spread amongst implementations, which is hard to read and maintain.

Thank you for reading so far. Please let me know what you think.

The proposal in general LGTM. Few questions and points:

  1. For preprocessor macro naming, instead of LLVMLIBC_ prefix, what do you think of LIBC_COPT_ prefix? I want to stop using the names “llvmlibc”, “LLVM libc” etc from within the libc project.
  2. I vote for the first of your proposed solutions in 3.2.3.
  3. Are there any action items that you propose for yourselves and others wrt cleanups, documentation etc?

Yes that’s better :+1:

It makes a lot of sense for totally different implementations, which is one of the use cases (e.g. math functions).

Another one comes to mind though. Because we want llvm libc to be completely modular, we want each function to be self contained. e.g., strcpy, strdup, strndup all embed their own version of memcpy.

For best inlinability the memcpy implementation is defined as a header only function with some selection logic.

It may be possible to move this selection logic to the build configuration. We should experiment with this and see the tradeoffs.

  • One possibility would be to use the preprocessor to inject the implementation (godbolt)
clang++ -DLIBC_COPT_MEMCPY_IMPLEMENTATION_HEADER_FILE="\"/path/to/file.h\""
#include LIBC_COPT_MEMCPY_IMPLEMENTATION_HEADER_FILE

It feels really unsafe and I have no idea how this will play with C++ modules.

  • Another option would be to leave the include resolution to the compiler by specifying include paths
clang++ -Ilibc/src/string/memory_utils/memcpy_implementations/x86_64/
#include <memcpy_implementation.h>

I’m not sure about maintenance and debuggability though.

I’d like to get to a rough agreement before starting to document the process. That’s the purpose of this RFC.

I think the topic of conditional file inclusion under 3.2.3 has a few nuances:

  • If small parts (in code volume) of an implementation are to be conditionally excluded/included, I do not see any thing wrong with having multiple paths listed in a single file. This would be like the memcpy implementation you linked to.
  • If the volume of included/excluded code is large and there is almost no overlap between them, then I think the FenvImpl.h model is more manageable.

Overall, may be we should just document the various possibilities with links to examples which suggest the layouts that one should use based on complexity/mutual exclusivity/overlap of their implementations.

I’ve started documenting macros style ⚙ D143413 [libc][doc] Add macros guidelines.
Then I’ll update the codebase to respect the style guide.
Then I’ll start documenting the tuning.