Discourse event time is shown incorrectly at the main page, some people were confused, sorry about that, Volodymyr will see how to fix this.
Libc++ for embedded.
Louis was not able to join this time, Volodymyr to check if he will be available to attend next time.
Peter: Follow up on past LLVM libc discussion.
After the investigation and internal discussion, Arm team confirmed that LLVM libc looks very promising, so wants to contribute.
Some of the topics that we want to touch/discuss upstream below.
Diffrernt size variants of some functions like smaller memcpy are needed for MCUs.
Customization - how to manage interrelated pieces of functionality?
HW abstraction layer - if there is no file system, but there is a serial port or debug interface of semihosting, how to redirect?
Process/app startup when there is no OS:
– Peter: Is this something that libc would include or just document how to add the startup code?
– Siva: if there is a standard/convention for startup code, then happy to add to the library. Question: How to test it? Need to add buildbots, then how people could debug failures in such a buildbot?
Some aspects of startup - setup stack, heap, etc - are architecture specific, but required. Other libs can be used as example, e.g. newlib.
Alan: Upstream support for MC/DC code coverage, which is useful for embedded functional safety applications.
An implementation will be upstreamed in next few months.
Who is interested? WOuld be great to help with code review.
Peter: Arm is interested to help with review. Team in Qualcomm probably can be interested too.
Will it work for bare-metal? Compiler-rt implementation is not suitable for bare-metal now - should be the next step to add support.
Petr: happy to help with code coverage too, Fuchsia team have some experience. Fuchsia uses coverage in the kernel.
– Now there is a lot of similar code in sanitizers, code coverage, profiler - a refactoring was suggested last year on Discourse (Using C++ and sanitizer_common in profile runtime): profiler runtime different from compiler-rt so suggested to rebuild it in C++ (from C) and on top of sanitizers common code. Sanitizer_common is already good at abstracting from the underlying OS/target. So migrating and improving would improve all use cases together.
– Fuchsia may work on this in the coming months. Pure refactoring would be useful too, e.g. remove multiple ifdefs, etc - structure the code better.
– Meta (company) team is doing interesting work in coverage now too. It is intended for mobile phones, e.g. adding boolean coverage mode (instead of counters). Fuchsia team is talking to them to coordinate.
Peter: Important point is how to get the requirements defined to fit all use cases from MCU to more high end systems. E.g. if there is small memory, have an interface to write the coverage/profile info to the debugger interface.
What are the key changes in the MD/DC coverage patch? Additional level of analysis for boolean expressions + additional counter object to keep track of.
(Peter Smith) LLVM libc HAL (hardware abstraction layer) investigation
Investigation done.
Peter will write up the overview of different approaches in Discourse.
newlib/picolibc and Arm libs approaches were analyzed.
HAL in both libraries is split into:
Bootup code (stack, heap init) - may be not included in the lib and provided by the user. Very HW dependent, may need assembly code.
IO - these libs have different approaches: newlib has sys calls similar to POSIX ones for retargeting (newlib has ~20 functions to reimplement to retarget), Arm libs has just a set of lower level routines to implement matching higher levels ones.
malloc - linker script needs to allocate some memory region for malloc to use.
Embedded systems can implement semihosting via debug interface or a serial port or such.
Next investigation step is to map the above to LLVM libc design.
Siva: threading abstraction layer should be considered too as a part of HAL. Agreed.
LLVM libc already has a level of abstraction for users to implement to retarget, including platform specific hooks for IO.
LLVM libc malloc: approach is not to do anything special for it, but the platform can reimplement malloc.
Questions: bootcode and device access code - is there a standard or such that we can adopt in LLVM libc? If there is no standard, how useful is it to come up with your own HAL? Answer: newlib can be considered a de-facto standard (libgloss is the implementation of HAL). It exists for more convenience to separate the retargeting code.
init for arrays and constructors is missing in libc, but is expected to be committed very soon.
May be easier to try to build LLVM libc for mare-metal to see how it comes together. Starting with a semihosting implementation would be easiest for debugging/testing.
May be useful to have some demo code in addition to Discourse discussions.
tests/integration_tests in libc project use libc own startup code, init is still missing though.
Petr: inits and finis exist in compiler_rt so potentially may be reused.
(Johannes) LLVM libc and libc++ for GPUs
GPUs will need libraries in the future.
These will probably not be standard compliant, but will face the same kinds of issues as embedded libs do.
How the above HAL maps to GPUs?
No need in startup code, it is already handled or not needed.
IO support (e.g. printf can be available), malloc may be available or not, etc.
Will need to compile the library code to LLVM IR then LTO it with the user code.
There exists a math library, but it is not in upstream since it is not clear where to put it. Option would be to also build libc for GPUs. As a risk this may bind the math library to LLVM project and libs instead of being compatible with LLVM and multiple other options users can be using now.
Function definitions in the header files must match the implementation on the device, thus may run into issues if the host header files are different.
If headers only were declarations without definitions that would be easier. E.g. the host may have one assembly instruction implementation of some functions that would not work on the device.
Example of issue is mismatch of object size on the host and device. GPUs try to match data layout of the host to allow for easy data transfer between host and device.
Is this an ABI question?
OSes still allow different definitions of say long even on the same actual host hardware.
Special headers may be used as overlays over the host headers to solve conflicting definitions.
In the original Arm ABI there was a question if it is possible to link objects compiled with different compilers and their own headers, there was a solution with a set of portability macros. Portability is at odds with performance. In reality, this approach was not properly implemented. In practice, most things work except for cases like long jump and other more complex constructs. Peter will try to find and share a link to the relevant ABI document.
Users always expect that everything that works on the host would be able to work on the device, this is a difficult expectation to meet. This is why the desire to reuse host headers as much as possible.
libc++
Topics that we want to discuss in the future calls:
How to configure libc++ builds to exclude non needed functionality (i.e. unused code to minimize code size)?
Would be good to have a size optimized version (vs perf optimizaed now), e.g. separate config for a different trade off for things like string/int conversions.
Both libc and libc++: how to distribute them, libc is built from source, libc++ is used as binary - would libc++ source builds be beneficial as well? For embedded especially? Allows fine tuning build options. There are examples to learn from, e.g. Risc-V toolchains (e.g. from Embecosm) are said to be able to compile libraries on demand. Should this question be part of multilib discussion?
Build systems: CMake has a set of predefined configs, multibuild generators can be used to build libc++ in multiple configurations.
Fuchsia toolchain ships a number of variants of libs, the multilib logic is hardcoded inside the driver. Maybe moved out into a base class to reuse or even an external configuration file.
Toolchain class can be subclassed to provide required features.
There is a prototype of a plugin - each vendor can implement their own as needed.
The idea is to provide a way to load plugins that can match target triples and then implement required logic for multilib support for the target.
Next step - implement multilib selection.
Support for plugins requires only a small amount of code. Some files may need to be moved out of the implementation folder to make them part of public API.
An alternative would be to create a DSL like GCC spec files to transform command line options into library selection options. May need rather tricky logic, thus difficult to design and implement reasonable DSL.
Q (Petr): multilib class exists in LLVM - can it be extended? A: This is enough, but each vendor may need to implement their own specific logic. There is no multilib implementation for baremetal Arm now. It may be beneficial to keep this vendor specific code downstream.
Q (Petr): what about performance and security of plugin implementations? A: could be an issue, indeed.
Peter: plugin DLLs on Windows may be tricky to maintain too.
Q (Peter): Would a data driven DSL/config file cover the need, rather than a complex executable DSL? Different targets may need different sets of command line options to take into account.
Q (Petr): Should we investigate support for actual (or subset of) GCC spec files? Minimal implementation may be small and may be easier for people to migrate from GCC. Was it not supported on purpose, based on LLVM design philosophy? Yes, there was such a discussion years ago, does it make sense to reopen the discussion again? Config files are becoming more configurable now too, so this may support the argument to look again into spec files support.
Mikhail will have a look into spec files option as well as a simple DSL to describe multilibs. Would be good to start a thread on Discourse - we may start with a high level options overview to decide on the direction (without spending much time for multiple prototypes upfront).
Q Mikhail: Is it possible to build runtimes with cmake by getting the list of multilibs from clang (like GCC does)? Petr: Not now. Cmake and clang info may not match now - no way to enforce. Fuchsia team is starting to look into this.
Libc
Peter: startup code write up on Discourse - to do an informative analysis of approaches.
Siva: no much news for now.
Stefan: Q: want to use libc for embedded JIT. Can it be compiled for PIC (position independent code)? Siva: yes, PIC should be possible as there should not be anything preventing PIC, otherwise please report an issue.
Stefan: The idea is to compile only required functions with JIT and load them on the embedded target. Compiled and linked on the host, then transferred to a device and executed there.
Q: Is libc designed with static build in mind or incremental (per function basis) builds are possible? A: If a given function does not depend on global data, that should be possible.
Note that there are different PIC models, e.g. PIE and actually PIC. PIE requires data to be at a specific place. There are models when there is a register with base for all the data.
Libc does not use virtual functions thus does not have an issue with vtables that needs to be addressed in a PIC model.
Three main points from last time: building without particular subset of features, code size optimized versions of some features, source vs binary distributions.
Would be interesting to have an overview of Embedded C++ Libraries - what special features do they provide and which of them may be relevant to libc++.
16-bit pointers?
Q Petr: Armv6 M0+ has 256k addresses only, but requires full pointers that use a lot of space - is it possible to use 16 bit pointers?
Peter: No, such an approach can only work for M0+ since even M3 can use megabytes of RAM. One advice is to put all global data into a static struct so that the compiler uses relative addresses.
LTO may be an option, however LTO does not play well with section placement in embedded.
Literal pool merging can save a bit of size as well.
LTO can be a good future topic, it would be nice to invite someone who works on LTO and ThinLTO. Qulcomm did a relevant presentation a few years ago.
Any preparation for the LLVM Dev Meeting (The LLVM Compiler Infrastructure Project Nov 8-9, 2022) round table. Reminder: there will be no sync up call in November because of LLVM Dev Meeting happening at the same time.
LLD features for embedded development.
Next steps for the libc++ discussion related to embedded use cases.
Any follow up on the previous topics on libc.
Discussion
LLVM Dev Meeting round table
David and Kristof from Arm will attend the LLVM Dev Meeting.
No specific preparation for the round table discussed.
LLD embedded features (Peter)
Arm team started working on some feature in LLD for embedded use cases:
Big endian was supported for AArch64, we are adding support for Arm for completeness.
CMSE (Cortex-M Security Extensions, TrustZone M) support for linking code for secure and non-secure worlds.
Patches will be upstream soon, reviews are welcome.
libc++ for embedded (Volodymyr)
Looking at existing libraries and guidance targeting embedded use cases, the following are some of the most usual configuration options/needs:
Cross compilation.
Cross testing.
No exceptions.
No RTTI.
No dynamic memory allocation, new/delete must not be used, placement new is allowed.
No IO, including indirect dependencies, e.g. force the demangler not to use any IO.
No locale support (mostly used by IO).
No floating point support.
No other big features like file system.
Options to simplify porting threading to RTOS.
Options to simplify porting clocks to RTOS.
Options to simplify porting to different C libraries.
Other topics:
Code size vs performance hand-optimized implementations.
Algorithms may be a good example here.
Multiple small switches vs one embedded configuration that configures everything at once.
Individual options seem to be more practical since they allow creating custom configurations.
Discussion
Fuchsia already uses multiple versions of libc++ with tweaks like above.
Actions? All: Try to implement and upstream such configuration options as people work on toolchains with specific libc++ configurations.
Pre-submit builders to check that the options still work are needed. Is this a prerequisite by maintainers anyway?
Follow up on previous topics
Multilib support status
The plugin prototype presented by Mikhail was finished, so we know the info needed to make the whole approach work. Next question is what is the best way to define this info.
Still need to evaluate DSL options to capture the same info.
One of ideas is to have build attributes with partial order defined on them, so that the multilib logic can search and pick the best match. Fuchsia uses a similar idea of “priority” to pick up the best variant.
DSL options:
GCC spec files: can start small, but there is a risk to eventually grow and get the while language in (that is perceived as undesirable).
I’ll ask this question here since I’ve mentioned MC/DC in the past and there is interest for the embedded community.
I am ready to put up a review for MC/DC (at long last!), but I’d like to create three individual reviews to make the process more manageable. There are 107 modified files, though the changes themselves aren’t so bad, and 50% of the touched files relate to testing.
However, the patches would need to be pushed together since they really are not independent of each other, and I want to know if there is precedent for that. I know phabricator has a concept of parent/child review, but the implication seems to be that each review patch can be pushed independently once it’s approved.
Also, I’m looking for reviewers who can assist. If you’re interested, let me know!
I recommend posting a separate thread to the list just in case it gets buried at the end of the thread
The most recent threads I can find on code coverage are:
These may be a good source for some authoratitive reviewers.
I’m willing to help for general comments, but I’ve personally no prior experience in the code-coverage parts of LLVM so I fall into the category of someone that might help but won’t be able to approve.
RFC would be useful even before the prototype implementation is available. Taking the holiday season into account, it probably can be posted early next year.
LLVM Dev Meeting follow up
Summary by Prabhu:
Google, Qualcomm, Nintendo, TI, … participated in the round tables related to embedded toolchains.
Downstream linkers → interested in sharing experience. Maybe a topic to follow up in the WG.
Security patches from TI may be shared upstream soon.
Related:
Nice to discuss embedded specific linker features and convince upstream maintainers they are useful, e.g.
Built in compression, e.g. for RW (copied from ROM to RAM and expanded).
Place a variable at a specific address, e.g. over a system register or IO ports.
When multiple banks of RAM are available, a linker needs a way to distribute segments across.
Linker script support in LLD (vs GNU) + support for embedded LTO.
Debuggability of linker scripts is not good - more errors/warnings/traces would be useful to understand the choices the linker made.
LLD related reviews:
Profiling working group for MC/DC patches from TI upsteaming review. There are three phab reviews related to MC/DC:
Fuchsia: existing LLVM multilib implementation is used, no need to have multiple incompatible variants of libraries, mostly used for optimization like with/without exceptions or different ABIs (e.g. with sanitizers - instrumented libraries can be layered on top on non-instrumented as a fallback). Now multilib logic is hardcoded.
Pigweed: this is a traditional embedded, the use case is similar to LLVM Embedded Toolchain for Arm.
Can we come up with a way to unify these two use cases, even if some migration is needed to converge?
One vs multiple include directories: Do we need to rely on sysroots or not?
Fuchsia only needs one include directory: libraries use the same API, but different ABIs only.
No other issues suggested.
Can we have a layered header file includes similar to libraries described above? More specific first, generic then - now it is already used like that for multiarch support in libcxx.
LLD key embedded features
Example, picolibc build system needed to be patched recently, because LLD has limitations in placing segments of memory, so we are running into practical issues.
There is a list of embedded linker features in the previous meeting minutes.
Volodymyr to reach out to LLD maintainer to arrange a discussion in one of the following sync ups.
Fuchsia team is comparing GNU LD vs LLD, there some known issues - can start a list in Discourse.
There was a discussion in the last LLVM Dev Meeting about LLD as well: diagnostic was mentioned as a major issue.
Google summer of code will be coming soon - LLD usability improvements can be a good fit.
Github - we can label relevant issues there to make them easy to find.
As mentioned on a couple of the embedded LLVM calls, my changes supporting MC/DC are presently in phabricator (quoted above):
Since the Developers’ Meeting last November, I’ve been hearing from more folks who are interested in seeing this functionality upstream but don’t have the LLVM expertise to contribute meaningfully to the reviews, unfortunately, so I could really use some help in getting things reviewed.
So far, @ellishg has been able to look at some of the back-end work and provide some good feedback. @smithp35 provided some good suggestions for the preliminary review I added, which I incorporated into the clang-specific review linked here.
Of course, I don’t want to trivialize the fact that everybody is busy, and many of you have upstreaming work of your own. I appreciate the feedback you have and whatever time y’all are able to contribute to this effort! I’m also on Discord if you want to chat about MC/DC.
Multilib implementation code reviews by Michael Platinigs
Other code reviews
Discussion
LLD key embedded features by Peter Smith
Two major areas:
Observability/discoverability - more understandable output, better usability.
Additional features:
Disjoint memory regions: multiple memory banks with different properties => possible linker script extension to distribute code over multiple free spaces in different regions.
RW data compression - copy RW data from ROM to RAM and decompress, can save ROM => could add to LLD or have a separate utility. It is important that compressions and decompression algorithms match! Maintaining multiple algorithms may add to overheads.
Memory-mapped variables - placing a section at a particular address, e.g. to access IO ports directly.
In practice many issues are up to linker scripts issues (difference in behaviour of BFD vs LLD), thus being able to debug linker scripts easily helps a lot.
Disjoint memory - distributing by hand is very tedious, indeed.
Compression is helpful.
LTO support with embedded constraints of placement is another interesting area - there was a presentation by TI recently.
Another GSoC project idea is for machine readable format, e.g. JSON, for debug output (also link map, that is different between linkers now, thus tedious to parse) so that people can create their own visualizers/analyzers. Would be nice to convince the GNU community to implement the same format as well.
Demo by Peter Smith how the features mentioned above work in armlink (Arm proprietary linker).
armlink has 3 compression algorithms, one very basic run-length for 0’s which is already very helpful.
armlink supports placement attributes from C code, i.e. saves on manually editing linker script files (called scatter files for armlink).
armlink can show useful debug info like call graph/stack depth required, also breakdown of code/data sizes including the libraries to analyse code size issues.
armlink can trace symbols to show why a particular one was included.
Multilib implementation code reviews by Michael Platings
⚙ D142933 Add -print-multi-selection-flags argument is about the proposed syntax for multi lib options not using actual command line names directly. It allows more limited, bit more stable API. Feedback is welcome!
Petr is reviewing and will get back with more feedback. Could we reuse tablegen here? May result in too much/complex dependencies.
We may consider making the feature experimental for the first LLVM release to allow later adaptation as per feedback from users.
Multilib implementation code reviews by Michael Platinigs.
MC/DC implementation code review by Alan Phipps.
FatLTO by Petr Hosek.
Other.
Discussion
Multilibs code review
Michael:
Patches in review, few rounds of discussions happened and comments addressed.
One patch landed, 6 more to finish.
How to speed up or accept the current version with the intent to improve/address any issues?
Feedback form Petr:
The team reviewed the RFC in detail, the response will be posted on Discourse in coming days.
Suggestion: There are changes to internal API and adding new file formats (which are UVB - user visible behavior), so for internal changes it should be OK to land, UVB may need a bit more discussion.
Michael: Could/should we be more aggressive: accept a format now as an experimental feature, so warn that it may and likely will change in the future? May commit now, but review/refine before LLVM17 release to have it as stable as possible by the next release.
Peter: It would be nice to be able to give it a try with real projects and see if it works, rather than keep overthinking.
Agreed: Petr posts the response on Discourse, then if after the Discourse discussion there are no blockers, we commit the current format and try to refine it for LLVM17.
MC/DC code review
Petr: Someone on the team is reviewing the patches, it goes a bit slower than wanted, but in progress, not forgotten.
FatLTO
Petr: FatLTO is progressing, there is an RFC and patches will be available soon. Approach aligned with LTO maintainers.
The idea of FatLTO is for object files to contain information for both normal and LTO linking (i.e. binary and IR code).
TI presented a revised version of LTO for embedded/linker scripts recently, their solution is similar to/compatible with FatLTO.
Todd explained the details of the TI solution from the presentation - the two teams will talk to each other to further align the approach and implementation.
Other
Peter: FOSDEM embedded developers were asking about a way to embed a section, e.g. a checksum, into the output image at the link time.
Petr: why is build-id not enough? Looks like something very custom/special.
Suggested that it would make sense to start a topic on Discourse to explain the use case, then consider possible solutions.
Peter: Use of TLS (thread local storage) in embedded projects. Picolibc uses TLS and initializes it in the linker script. The linker script and the library need to agree on the calculations of relevant addresses. LLD and GNU LD disagree on this - Peter is looking to create a reduced reproducer.
Is anyone using TLS in embedded apps? Vince: No, but had similar issues.
Is this going to change with C11 used more in embedded? Something to look out for in the future.
Peter will post an issue with the reproducer upstream.