LLVM Dev Meeting 2023 Embedded Toolchains Workshop Notes

The embedded toolchains workshop took place on the Tuesday before the conference. I’ve not got a complete set as I was talking too much. The rest is from my memory. Hopefully others that were there can fill in the gaps.

There is rather a lot to write, my apologies if some of the following doesn’t read well.

The topics we had arrage to discuss included:

  1. LTO and linker scripts
  2. LLVM Libc and Libc++ and embedded systems
  3. Code coverage and profiling
  4. LLD improvements for embedded systems

The structure of each topic was a few slides to introduce followed by a discussion. I’ve only got the slides for my own presentation. Hopefully others that attended will be able to upload theirs in a separate post/reply.

LTO and linker scripts

Has been subject of two LLVM dev meeting proposals, one from Qualcomm and one from TI.

There are two fundamental problems

  • The originating object file is lost in code-generation so some selectors don’t work.
  • Some systems have restrictions on inlining, sometimes asymmetric. For example code built to a high safety criticality can be inlined into code built to a low safety criticality but not vice versa.

Two existing proposals:

Rethinking the RFC: Relocatable linking…

  • Do we need to make alterations to the code-generator.
  • Can we treat each OutputSection as its own individual LTO partition (no whole program optimisation).
    • The linker gets back one ELF file per output section, rewrites the linker script output section to include the LTO generated ELF file.
  • Would satisfy the original requirements at the cost of leaving out some optimisations that might be permitted in most systems.

Additional questions:

  • What are the semantics?
  • How do we control the behaviour?
  • How should the devs express constraints?
  • Are input/ouput sections the best mapping formula?
  • Are there IR semantics/attributes that are incompatible with linker scripts?

My summary of the discussion is that the linker script semantics are not fully understood. We need to have some statement of what the semantics are, and how they might be affected by LTO. For example what restrictions does an input section selector impose on the LTO code-generator.

We also need to work out how to deal with cross OutputSection optimisation. Options include:

  • No cross OutputSection optimisation.
  • Command-line option to toggle globally whether cross OutputSection optimisation is permitted.
  • Linker script extensions to identify where cross OutputSection optimisation is possible.
  • External list of OutputSections given as file/command line option.

An observation is that linker scripts are not particularly well specified at the best of times. Best we have is the GNU documentation, which has some gaps, and some “implementation defined” parts such as orphan placement.

A more formal definition of how LLD handles linker scripts would be useful. We can add how LTO interacts with LTO to it.

Libc and libc++

libc++ has implemented carveouts

  • no threads
  • no localization
  • no wide characters
  • no filesystem
  • no source of randomness

Carveouts feed through into the tests. Can be used to build a stripped down libc++ more suited to embedded systems.

Adds some complexity, no wide chars is particularly painful so each carveout needs to be justified.

Some plans to officially support freestanding.

Challenges:

  • Platforms consuming carveouts are often not upstream. Raised as feature requests.

  • Boundary for carveouts is often fuzzy. For example many embedded systems can support a subset of a filesystem, enough for filestream, via semihosting.

  • Testing embedded platforms.

    • There should be a picolibc buildbot using qemu up soon.
    • Is qemu always the right choice? Could have differences from real hardware, but does not suffer from dev-board and other hardware flakiness.
    • Scope for a demonstrator platform that builds a typical embedded libc and libc++ platform.
  • Embedded systems have no requirements for ABI stability, is there more opportunity for aggressive changes?

  • Both llvm libc and libc++ tend to prefer performance over code-size. Are there opportunities for implementation choices?

  • Embedding libc and libc++ into your project

    • Instead of buiding libc and libc++ separately they are just project dependencies. Could permit tighter integration, but would add complexity to CMake.
  • For people building MS DLL like shared libraries more control over symbol visibility, would be useful.

  • There is a libcxx vendors group. Please take part to get early warning or influence potential ABI breaks.

  • std::shared_ptr should be atomic, but this is not the case when threading is disabled.

  • Some changes will need to come through the C++ standardisation process. For example many RTOS thread implementations have additional parameters that can’t be accessed via std::thread et-al. Some more constructors are necessary to provide this information.

    • If you care about it, please write a paper!
  • Existing porting layer for threads is nice, but it is all in, or all out. Is there scope for a partial implementation?

    • Could libc and libc++ share the same porting layer?

Code coverage and profiling

  • Compiler-rt profiling runtime has too many POSIX dependencies, almost everyone using code-coverage has written their own minimal profiling runtime. There is scope for an upstream skeleton implementation, although this will likely need to be customised further.
  • MC DC close to landing, very important for functional safety
  • TI have local modifications to separate out the read-only information from the object files to keep the memory footprint down.
  • 32-bit counters sufficient for an embedded system.
  • Collectively can reduce the memory footprint by tens/hundreds of kilobytes which is significant for some microcontrollers.
  • Some of these changes could help with non-embedded cases.

LLD improvements for embedded systems

I’ve attached my slides on LLD improvements.
LLVM Embedded Workshop 2023 LLD.pdf (239.2 KB)

Filling gaps in linker scripts

For example a tightly coupled memory TCM must contain some manually placed sections, but spare space can be used by any other section. Currently sections have to be manually placed.

Several options, but most involve linker script extensions. We think the GNU ld command line option --enable-non-contiguous-regions is the best starting point as this covers most use cases without needing a linker script extension. Extensions could in principle be added later if that proved insufficient.

Map files

GNU map file output would ease porting efforts from GNU ld to lld as map files could be diffed without further processing.

General agreement that a machine readable format with some stability guarantees would help many projects. Multiple people in the audience had written a map file parser, as well as a readelf parser and had to update it every time the format changes.

JSON seemed to be the most likely choice for the output format.

Next action is for someone to be brave enough to write a prototype and send out an RFC. The interesting design choices are likely to be around how to represent symbol assignments. Currently these are inlined in the map file, but this may not be the best choice for a machine readable form.

A good test of the machine readable format would be to write a translator to the current map file output or the GNU ld map file format.

Overlays

LLD has almost all that is required for manual overlays. It is missing the NOCROSSREFS diagnostic, which although only a diagnostic, without it using Overlays is too error prone.

More sophisticated overlay schemes exist, but these are not necessary for the vast majority of cases.

LMA and TLS alignment

GNU ld and LLD have different behaviour, with GNU ld also supporting ALIGN_WITH_INPUT, with each option justifiable, but having a different set of trade-offs. Difference can cause some subtle bugs when porting programs. Does LLD need an option to select GNU ld or LLD semantics?

The LLD algorithm for calculating local exec TLS also differs from newlib/picolibc which can lead to bugs in some corner cases (.tbss is overaligned but .tdata is not). Is there scope for a new linker defined symbol that contains the result of the linker’s calculation. A library can pick that up and be sure of matching the alignment padding.

Diagnostics

A --why-live feature that can tell you why a particular section is live (not garbage collected) will be a useful feature for projects trying to trace references. Need to construct a graph of section cross-references as relocations are uni-directional. Will likely need to make this an optional slow-path pass after garbage collection to avoid the overhead.

Miscellaneous

  • Implement .gnu.warning.*
  • : and :
  • diagnostics when no archive member is selected by the input section pattern.
3 Likes