[RFC] Customizable namespace to allow testing the libc when the system-libc is also LLVM's libc

sivachandra · August 28, 2023, 7:34am

Introduction

The unit tests for the functions in LLVM’s libc test the functions defined in the namespace __llvm_libc. They do not call or use the public names of those libc functions. This is done to ensure that the tests use and test functions only from LLVM’s libc and do not accidentally test equivalent functions from the system-libc. To further increase the confidence that they do not accidentally use/call the public name, test binaries are linked to object files which contain only the internal, namespace-qualified name. For example, the test for memcpy is linked to an object file which only contains the symbol for __llvm_libc::memcpy. This works well when the system-libc used by the test executable is not LLVM’s libc. Specifically, it ensures that the test actually is testing an entity from LLVM’s libc and not from the system-libc. However, when the system-libc is also some version of LLVM’s libc, building libc tests leads to a duplicate symbol error. This is because, even though the test does not explicitly call the public memcpy function, the compiler code-gen ends up inserting calls to the memcpy function. Consequently, the linker pulls not only the memcpy.o object which contains only the internal __llvm_libc::memcpy symbol, but also the memcpy.o from the system-libc which has both the internal symbol __llvm_libc::memcpy and the public symbol memcpy. Since both of these object files are required, and since both of them contain the symbol __llvm_libc::memcpy, the linker fails with a duplicate symbol error. In this RFC, we make a case for the need to solve this problem by presenting a few contexts in which the problem is serious enough to be considered as a blocker. Following that, we propose a general solution for the CMake as well as Bazel based build systems.

NOTE: For the sake of brevity, we will refer to LLVM’s libc as the libc in the rest of the document.

Problem Contexts

The duplicate symbol problem occurs in a few contexts, all of which are of the same theme of linking certain object files from LLVM’s libc to a system-libc which itself is another version of LLVM’s libc. We present a few such scenarios to make a case for the need for a solution to this problem.

Unit tests on a system where LLVM’s libc is the system-libc

As the libc becomes popular and gets adopted as the system-libc on various platforms, libc’s unit tests will start failing to build with duplicate symbol errors on those platforms. This is because, as explained with the memcpy example above, even though the libc tests pull object files containing only the namespace-qualified internal symbols, they are also linked to the system-libc. The system-libc, being another version of LLVM’s libc, will have the public symbols of libc functions as well as their namespace-qualified internal symbols. This leads to the linker throwing duplicate symbol errors (internal symbols are linked twice.)

Microbenchmarks on a system where LLVM’s libc is the system-libc

Similar to the libc unit tests, the libc’s microbenchmarking system also only pulls object files with libc’s internal symbols but is linked to the system-libc as well. This leads to the same situation we have with linking libc’s unit tests on systems where the system-libc is another version of LLVM’s libc.

Differential fuzzing/testing

There are a number of differential fuzzers and tests in the libc source tree which compare the in-tree implementations against the system-libc implementations. If the system-libc is a version of LLVM’s libc, then the differential fuzzers and testers will also fail with the same duplicate symbol errors seen in the other two contexts.

Solution: A customizable namespace name

The general solution is based on making the internal namespace used in the libc customizable. Today, the namespace is of a fixed name __llvm_libc. We propose to replace all references to this namespace with a macro named LIBC_NAMESPACE. This enables us to define a custom namespace name, which can then be changed or adjusted depending on the context in which the libc components are built, to avoid the duplicate symbol errors. All build targets in CMake and Bazel must explicitly define this namespace. As a convenience, and as a way to ensure consistency across all targets, the libc build rules will be updated to explicitly specify the value of this macro. A check to ensure that the macro is actually defined will be implemented in src/__support/common.h. The check could be something simple like:

#ifndef LIBC_NAMESPACE
#error "LIBC_NAMESPACE macro is  not defined."
#endif

Namespace in CMake

In general, we will make the namespace customizable. However, in order not to burden the user with yet another moving part when building the libc, we will use a default of the form:

__llvm_libc_<LLVM_VERSION_MAJOR>_<LLVM_VERSION_MINOR>_<LLVM_VERSION_PATCH>_<LLVM_VERSION_SUFFIX>

Assuming that the development version used by the libc developers and buildbots is different from the version of their system-libc, this scheme would not in general lead to duplicate symbol errors. There is a special case of course: When libc developers want to compare two developer versions of the libc. In such a case, the developers should appropriately override the namespace for the two different versions they want to compare.

Namespace in Bazel

In order to keep it simple, the Bazel build will follow the exact same conventions used by the CMake build. This will also be least confusing for developers working with both the Bazel and the CMake builds.

Namespace in other contexts

Since the namespace is essentially customizable, other contexts like Fuchsia and Pigweed can use a namespace of their choice as suitable to them. It would still be a challenge for contexts where libc is used as a normal static library - those contexts should employ a strategy which clearly separates the production libc from the under-test libc.

Roll out plan

The rollout of the customizable namespace, and the default name which includes the LLVM version in the namespace, will be done in the following steps:

Make required Bazel and CMake preparatory changes which define the var for the customizable namespace and use it (pass it as a compile option -DLIBC_NAMESPACE=<...>).
Add a guard in src/__support/common.h
Update the libc’s clang-tidy rules which ensure that the libc code is all in the namespace __llvm_libc. This rule should additionally check that all libc source files include src/__support/common.h.
Mass replace __llvm_libc with LIBC_NAMESPACE in the source code.

@gchatelet @michaelrj-google @lntue

gchatelet · August 29, 2023, 3:34pm

I have a POC patch for 1 and 2
https://reviews.llvm.org/D159112

gchatelet · September 15, 2023, 12:41pm

[libc] customizable namespace 1/4 by gchatelet · Pull Request #65321 · llvm/llvm-project · GitHub
[libc] customizable namespace 2/4 by gchatelet · Pull Request #65471 · llvm/llvm-project · GitHub
https://github.com/llvm/llvm-project/pull/66504
[libc] Mass replace enclosing namespace by gchatelet · Pull Request #67032 · llvm/llvm-project · GitHub

Topic		Replies	Views
[RFC] Standalone libc++ test suite C++ libcxx	0	368	November 22, 2022
[RFC] llvm-libc tuning C	4	788	February 6, 2023
Notes from the LLVM libc round table @ EuroLLVM 2024 C	1	282	April 15, 2024
[RFC] LLVM libc testing strategy C libc , testing	1	282	December 18, 2023
memcpy and bootstrapping LLVM Dev List Archives	2	87	May 13, 2008