LLVM Embedded Toolchains - EuroLLVM 2023 round table summary

We had a round table discussion on at EuroLLVM last week. This post is my recollection of the discussion. If anyone else who was there would like to add their thoughts please add to the thread.

Before I start I would like to advertise the LLVM Embedded Toolchains Working Group sync up. This virtual round table discussion occurs once every 4 weeks, all are welcome. Details can be found in LLVM Embedded Toolchains Working Group sync up and
Getting Involved — LLVM 17.0.0git documentation

Reviews and Requests for features
A lot of ideas came up on the round table. A lot of the developers aren’t working directly with projects that need specific features. This detachment makes it harder to know how to prioritze what to work on. If you have an opinion, even if it is just what you would like to see then please do let the developers know on discourse or in the LLVM Embedded Toolchains Working Group sync up.

We are also interested in feedback on work in progress patches. As an example, there is an RFC out for a data driven multilib [RFC] Multilib we are looking for people to try this out and leave feedback, even if it is just we tried it and it worked!
There is a similar set of patches for MC/DC code coverage in ⚙ D138849 MC/DC in LLVM Source-Based Code Coverage: clang

Advertising LLVM Toolchains for embedded systems
A number of presentations (Fosdem and Embo++) have been made this year about LLVM in embedded systems, however this is still only a small audience. A community
blog post with an example of porting an open-source project currently building with a GCC toolchain to a LLVM based toolchain would be a useful starting point.

Collaboration with GNU
Many embedded projects that use open-source tools need to support LLVM and GNU toolchains. Wherever possible we should work with the binutils community to get new features adopted by both communities.

Upstream Testing
While there are a number of teams doing downstream testing of LLVM for embedded targets, there are no upstream build-bots for many embedded targets. There is an opportunity to test compiler-rt builtins which should be straightforward. There are opportunities for libc++, libc++abi and libunwind, with the proviso that some features will need to be disabled that the targets can’t support.

Downstream testing could be helped by some sample embedded configurations that downstream toolchains can adapt for their use case.

Documentation
Cross-compiling the runtimes such as compiler-rt and libc++ can be quite difficult to work out. Our existing documentation is often out of date, particularly with the introduction of the runtimes build. It would be useful to retire or update the documentation.

The LLD documentation https://lld.llvm.org/ is largely developer focused. In particular it would be helpful to update and expand the linker script differences in Linker Script implementation notes and policy — lld 17.0.0git documentation . Other known differences between GNU ld and LLD would be helpful for users adopting LLD. It would be realistic to do this incrementally

Use of LLVM libc in embedded systems
Some projects are already using LLVM libc. Not all C functions are supported, but many projects only need a small number of functions and these can be implemented on demand. The LLVM libc developers are interested in which functions are needed first.

Some llvm-libc functions such as printf have been optimized for size, and can compile down to a very small size.

Linker Overlays
LLD supports overlays in the same way that GNU ld does. This requires manual assignment of sections to overlays and an overlay manager to switch between the overlay. There are ways to make these easier to use, Arm’s proprietary toolchain has an automatic overlay feature which inserts code to switch overlays automatically: Documentation – Arm Developer . Something like this could be a useful feature in LLD for the projects that need it.

It was mentioned that there is an effort called ComRV to standardize overlays. The links that I was able to find:

LLD trace options
LLD has a small number of tracing options such as --verbose, --why-extract=, --warn-backrefs and --trace-symbol=. These are very helpful in tracking down problems in their specific areas. There are other areas where there is nothing beyond looking at the map file, assuming the link got that far. Some more tracing, particularly in the area of linker scripts could help users and developers alike. Something like llvm --print-after would be very useful. The challenge would be designing the output in a structured way as the inputs to the linker can be very large. Making sure additional trace is integrated without being spread over the code-base also requires thought.

LLD map file output
LLD and GNU ld have different map file output formats. While one is not necessarily better than the other, having the option to have a similar output format will help projects migrating from GNU ld to check differences. Going a step further, a machine readable map file in something like JSON would make it easier for other tools to consume and analyze.

llvm-objcopy
Support for Motorola srec format (GNU objcopy -O srec) would be useful for some projects wanting to transition from GNU.

6 Likes

Regarding blog post porting GCC toolchain to LLVM:

I’ve been working on porting the OSS GCC Game Boy Advance toolchain to LLVM. Currently you can build C and C++ code and use picolibc, but I did need to modify compiler-rt, LLD, linker scripts, and crt0 for this to work. I’m also working on the LLVM backend to allow for common Arm GCC syntax idioms.

I had planned to write a blog post somewhere in the coming months about what I ran into while porting, so that seems to fit well.

2 Likes

I’d love to move CircuitPython over to an LLVM toolchain because of the tooling such as include-what-you-use and clang-tidy. My last attempt to do it was with clang 13 here. The main blocker has been code size increases when switching to clang on Cortex-m0+. CircuitPython could be an interesting project to target because it runs on a variety of MCUs including Xtensa, RISC-V, Cortex-M0, M4, M7 and even a couple Cortex-A. It isn’t important to us to keep using GCC if Clang is working better.