Introduction
At the LLVM Embedded Toolchains working group sync up (LLVM Embedded Toolchains Working Group sync up) we have been discussing llvm-libc and embedded systems. With the aim to share some of the experience that toolchain suppliers have with embedded libraries, and how that might be applied to llvm-libc. I volunteered, several meetings ago, to write up Arm’s experience with its proprietary C-library, turns it out it is harder to put in words than it is to talk about!
We are very interested to hear from other potential users of llvm-libc in an embedded context, we’re relying on our own Arm coloured experience which may not cover all use cases.
General properties of embedded C-libraries
The definition of embedded used here is in the spirit of a freestanding rather than hosted implementation. There is no assumption of an operating system abstracting away the hardware, with services to implement the C-library on top of. Common use cases for such a library involve developing firmware for larger devices and software for microcontrollers and real-time systems.
Common properties of an embedded library:
- Statically linked (no OS to dynamically load), often for a specific sub-target.
- No assumption of an OS to provide primitive operations, load the program or initialize the target hardware.
- Code-size often more important than performance. Desire to only include what is used.
- IO often redirected via a serial port or emulated by the Host/Debugger.
LLVM-libc alternative implementation of functions
Many embedded systems have a limited amount of read-only and read-write memory. Embedded C-libraries tend to favour small code-size over performance. Some of the functions in llvm-libc are tuned for maximum performance at the expense of code-size, for example the memmove implementation when compiled for Arm is about 100 times larger than equivalent implementation in Arm’s C-library, exceeding the total flash size of the smallest microcontrollers. We’ll need a way of building llvm-libc so that alternative implementations with a different code-size/performance trade off are permitted, particularly in the strings area. Many functions in the C-library are independent, so it shouldn’t be too difficult to offer a choice of implementations at build time. There are areas that are more complicated due to dependencies between components, for example many functions depend on the definition of opaque types such as FILE. In particular the locale implementation affects several parts of the library such as time, ctype and string.
Some embedded developers are willing to trade-off standards compliance as well as performance for minimum code-size. Particularly on the smallest micro-controllers like the Cortex-M0 that can have implementations with as little as 8KiB of Flash. The library might choose not to conform to IEE 754, support configurable locales or wide-characters. Our experience with the smallest possible code-size use case is that it is better to design a C-library with this use case in mind, rather than trying to strip down an existing C-library to meet these constraints. For llvm-libc it may be worth setting out how far an alternative implementation can go. For example do the build and tests need to support a library without configureable locales?
When implementing components with dependencies it will be good to isolate these so that alternative implementations are possible.
Printf optimization
The printf function in its entirety is large as it has to handle many different cases in the format string. In an embedded system an optimized printf can omit the code to handle format specifiers that aren’t used. I’ve seen a number of mechanisms, with conditional compilation the most common one, such as https://github.com/mpaland/printf, however this would be difficult to justify in a toolchain that supplies pre-compiled libraries as we can’t know what might get used. With compiler support it can be possible to arrange a printf installation so that only the code to handle format strings is required, this would likely be similar to the existing __builtin_printf transformation. For example the code to handle the floating point specifier could be located in a separate object file, the main printf implementation only refers to it via a weak reference so that it is not automatically loaded by the linker, when the compiler determines that it is needed it emits a non-weak reference to another symbol defined in the object file for just that purpose.
I must confess I’ve not dug too far into the details of the llvm-libc printf implementation so it is possible that it supports what would be needed already, my guess from https://github.com/llvm/llvm-project/blob/main/libc/src/stdio/printf_core/parser.cpp#L124 is that this would be a compile time choice?
It is likely that llvm-libc would need to provide a printf implementation that colludes with LLVM optimizations as an alternative implementation and not the default.
Low level hardware abstraction layer
An argument could be made that the implementation of the low-level hardware abstraction layer is a separate library to llvm-libc that the llvm-project could provide a reference implementation for common platforms. The low-level hardware abstraction library is essentially providing the functionality that an OS would provide for a hosted implementation. As an example the newlib hardware abstraction layer is called libgloss Embed with GNU
Startup Code
The definition of startup code I’m using here is code that runs before main, usually in an object called crt0.o
. From the perspective of the C-library this usually requires:
- A hook for user supplied code to Initialize any hardware. For example, enable the floating point unit, cache, MPU or MMU and potentially change to a lower privilege level. This supports the case where the startup code is the first thing in the system to run after reset.
- Setup a stack pointer, often using a region of memory defined by a linker script.
- Copy/Zero-initialize memory, driven by the linker-script (can be done before a stack pointer is available with assembly).
- Run initializers such as
.init_array
- Call main, with some code to handle argc and argv if the low-level abstraction layer supports it, for example semihosting can obtain command line options from the host.
- Call exit after main has finished, this is not usually expected to return in an embedded system.
The libgloss crt0 is written in assembler and is quite difficult to follow. The picolibc library, has written more of this in C. For example https://github.com/picolibc/picolibc/blob/main/picocrt/crt0.h has the generic part, there are machine specific parts such as https://github.com/picolibc/picolibc/blob/main/picocrt/machine/aarch64/crt0.c and https://github.com/picolibc/picolibc/blob/main/picocrt/machine/arm/crt0.c
The startup code often has assumptions about linker defined symbols to provide the location of the stack and in some cases the heap. It is not common for the library itself to provide the linker script as hardware varies considerably, the toolchain might include some examples to run on a simulator. For example the Arm LLVM embedded toolchain has https://github.com/ARM-software/LLVM-embedded-toolchain-for-Arm/blob/main/ldscript/base_aarch64.ld
Retargeting of IO
Many embedded systems do not have a filesystem, but it is still useful to be able to use these facilities from embedded programs, particularly in testing. With a retargeting layer IO can be redirected through a peripheral like a serial port or implemented by a Host such as a debugger or model via semihosting (https://github.com/ARM-software/abi-aa/blob/main/semihosting/semihosting.rst)
The libgloss (Newlib’s hardware abstraction library) has some documentation on its retargeting layer Embed with GNU
Typically the high-level routines are implemented in terms of a narrow porting layer. This can limit the ability of the implementation to optimize so it is possible that this would need to use an alternative implementation.
Retargeting threads
This is similar to libc++ where the default implementation of std::thread is built on top of pthreads, which a bare-metal environment can’t usually assume. For libc++ there is an option to use an external threading header file which permits an alternative low-level theading implemenation https://github.com/llvm/llvm-project/blob/main/libcxx/docs/DesignDocs/ThreadingSupportAPI.rst . I expect that we’ll need something similar for llvm-libc to work with an OS that has its own threading primitives.
Retargeting memory management
There may not be an OS to provide more heap memory. In principle a hard-coded alternative malloc implementation could be provided, although there is a way to solve this problem by requiring an implementation of sbrk by the hardware abstraction layer. An example of a very simple implementation that uses linker defined symbols to identify an area of memory for the heap can be found in picolibc https://github.com/picolibc/picolibc/blob/main/newlib/libc/picolib/picosbrk.c
Layering on top of an existing library
The current layering scheme relies on dynamic linking so it won’t be suitable for static linking. I can’t think of a clean way to implement layering on top of an existing static library. It is possible if llvm-libc is encountered first by the linker and defines all the symbols of a corresponding object(s) in the base libc then the object from the other library won’t be selected by the static linker. If there are symbols defined by the corresponding object(s) base libc but not llvm-libc then both objects could get selected leading to multiple symbol definition errors. I don’t think that this would be manageable without picking a specific base libc.
POSIX Compatibility
The IEEE 1003.13.2003 specification 1003.13-2003 - IEEE Standard for Information Technology - Standardized Application Environment Profile (AEP) - POSIX(TM) Realtime and Embedded Application Support | IEEE Standard | IEEE Xplore defines subsets of POSIX that are suitable for bare-metal embedded systems.
- PSE 51 (Minimal, No MMU and no physical filesystem)
- PSE 52 (Controller, MMU and physical filesystem)
These specifications are primarily requirements on an Operating System, however the specification does include parts of the C-library and there are some areas of overlap. For example the C-library will need to provide POSIX extensions, the OS will have to provide the low-level pthread implementation, file descriptors and signals are a bit more ambiguous.
I expect that the OS developer will have to do the majority of the work to integrate llvm-libc. I’m thinking that the majority of the work in llvm-libc is to make sure that the POSIX requirements for the C-library can be met, and there is documentation for OS vendors to follow.
References
Two OSS RTOS that have a substantial subset of POSIX implemented.
- POSIX Support — Zephyr Project Documentation
- https://docs.rtems.org/branches/master/posix-compliance/posix-compliance.html
FreeRTOS appears to have some experimental support FreeRTOS-Plus-POSIX - FreeRTOS
Build configurations
Even within a single LLVM target like Arm, AArch64 or RISC-V there will be a large number of possible sub-architectures and ABI options. For example the GNU embedded toolchain for Arm has 32 multilib combinations across Arm, Thumb, v5 to v8.1, soft-float. hard-float, vector unit present. Other architectures will have their own combinations. We’ll want to have a way to:
- Build all the variants needed for a toolchain via a single runtimes build.
- Describe the variants to the bare-metal driver via multilib, this is hardcoded as of Today. Ideally the libraries build could populate configuration files that a multilib aware toolchain could use to configure itself.
- Run some form of tests for each variant assuming some kind of test runner is provided.
Testing
Buildbots
To run the tests on an embedded system is likely to require something like libc++'s test executor concept which can be used to run on an emulator, or even download a test to a dev board. This works although it can be slow, especially when running many different variants. It is difficult to get around this without a lot of work in llvm-lit or the tests to do all builds before running tests in a batch.
The target specific implementations of the hardware abstraction layer are not going to be tested by existing host based BuildBots. For each new target we should have public buildbots that are likely to span the majority of variants supported by conditional compilation. For example on Arm v8-m.mainline and v8-m.baseline, v8-r and v8-a are likely to be close enough to be covered by the same builder.
There are to my knowledge no public build-bots for the compiler-rt built-ins which has made introducing changes without undetected breakages harder than it would otherwise have been.
Subset tests that require a feature unavailable on the target
I think these can work in a similar way to libc++ handles subsets of the library. A feature like C11 Threads that requires an external threading implementation may not be desirable to include in a library build, or may be difficult to test without an OS. A way to define build options to define a subset that also disables tests that require the subset will be useful. For example https://github.com/llvm/llvm-project/blob/main/libcxx/cmake/caches/Generic-no-localization.cmake that turns off localization support.
Reproducing failures
Setting up all the dependencies to run tests can be quite a bit of work. Is it worth asking bot maintainers to provide docker containers with the necessary models and test execution scripts?
Where can Arm Contribute?
Arm is interested in having llvm-libc support embedded systems. This would make it possible to construct an embedded toolchain entirely from the llvm-project.
Our thoughts are that getting to a stage where we can build the library with enough of a hardware abstraction layer to run the tests on a model would be the best starting point. That would form the basis of a build-bot that could be used to support further development.