LLVM Embedded Toolchains Working Group sync up

2023-07-20

Participants

  1. Peter Smith

  2. Michael Platings

  3. Anton Repetov

  4. Michael Jones

  5. Nathan Sidwell

  6. Petr Hosek

  7. Prabhu Rajasekaran

  8. Scott

  9. Stan

  10. Yvan Roux

  11. Yung-Chia Lin

  12. Vince Del Vecchio

  13. Zhi Zhuang

  14. Garrett Van Mourik

  15. Volodymyr Turanskyy

Agenda

  1. Planning for the Pre-LLVM-DEV’23 – Embedded Toolchains Workshop.

  2. Follow up on code reviews in progress.

  3. Ideas/questions from Scott in Discord:

    • Memory region function attributes and how they’d impact inlining and output section.
    • Assembly inline with the source similar to opt-viewer, but be able to have gcc assembly alongside clang generated assembly.
    • Using Arm trace data as an input to PGO. That’d give high quality performance data without needing any instrumentation.
  4. AoB

Discussion

Pre-LLVM DevMeeting workshop (Peter)

NOTE: LLVM sync on 12th Oct will overlap with the LLVMDev meeting, so we will skip it.

  • Proposal submitted - did not hear back yet. Number of people requested ~25. There was a list of possible topics suggested - we will need to review and confirm topics and agree who can drive each of the topics.

  • News and next steps to be posted on Discourse when the workshop is confirmed.

Code reviews

  • Update from Alan Phipps on MC/DC: code reviews have been accepted, thanks for the help!

  • Michael P: libc++ with picolibc testing: code review accepted, expected to land soon, buildkite CI will test the picolibc (embedded) configuration of libc++ running in QEMU on Armv7-M.

  • Unified LTO, discussed previously, landed (RFC 2, patch a1ca3af 2) - impact/opportunities for embedded?

    • Unified LTO landed: thin or full LTP can decide on link time.

    • FatLTO: changes are mostly accepted and started landing, it may take a few more days to finish.

Code-size comparison (Scott)

  • opt-viewer style tool: Code comparison using objdump and llvm-objdump and debug info to match the output.

  • May be similar to LLVM performance testing: there is a system to use perf data to compare performance per building block between builds from different days.

Placement of code (Scott)

  • Function attribute to define memory region and copy depending function in the same memory region to be provided by the compiler.

  • Similar to what is needed for LTO to support placement in output sections. Do a pre-assignment of the output section before running the LTO itself. There was a link to the relevant presentation in the Discord channel: 2022 LLVM Dev Mtg: Link-Time Attributes for LTO: Incorporating linker knowledge into the LTO... - YouTube.

  • Automatic attribute propagation through the call graph is useful if there are libraries source code of which cannot be changed.

  • Somewhat similar to overlay logic to copy or not functions for different overlays.

PGO from traces (Scott)

  • PGO: trace capability of higher end CPUs - can it be used as input to PGO (without code instrumentation)? Branch instructions are most interesting to recreate the flow. Should be possible in principle. Arm Streamline is a trace based tool, armcc (Arm Compiler 5) was able to read its output, but not armclang (Arm Compiler 6).

  • There are a lot of trace formats out there so it could be tricky to parse all of them.

  • Compiler teams use a lot of models for testing, however for people working with peripherals there are less options.

Findings from migrating a hypervisor (Peter)

  • FIasco hypervisor (GitHub - kernkonzept/fiasco: The development version of the Fiasco.OC microkernel) has support for clang compiler, but not LLVM binutils.

  • Some issues found with llvm bin utils: llvm-objdump and llvm-objcopy have slightly different bahavior to GNU, which causes build issues.

  • Peter will raise upstream issue for these.

  • LLD: asserts in linker scripts - different behavior because of different time when the conditions are checked by LD vs LLD, thus build failure again.

FP modes in compiler_rt (Peter)

  • compiler_rt software emulation of floating point: rounding modes and flush to zero - who is interested in improvements? Having faster vs stricter IEEE modes. Arm can contribute.

  • Most of the time no-FP is used, thus limited experience and/or interest.

Embedded benchmarking (Petr)

  • What is a good set of benchmarks for embedded? embench (https://www.embench.org/)?

  • May be good to add something to LLVM test suite, if the benchmark is open-source.

  • Peter: Dhrystone and CoreMark, EEMBC are widely used, however they are mostly C (no C++).

    • CMSIS DSP, CMSIS NN can be used as application benchmarks, especially for SIMD.

    • embench was considered by the Arm team, however is not adopted for regular testing yet.

  • Scott: MicroPython has a set of benchmarks, can be seen as a more real world use case.

CMSIS clang support (Petr)

  • CMSIS is a dependency of a project the team is working on, but it does not support clang yet.

  • Volodymyr: CMSIS6 clang support is in progress: Core(M): Add support for LLVM/Clang · ARM-software/CMSIS_6@193243d · GitHub

  • There is no current plan to backport to CMSIS5, however both the clang enablement is a minor change and CMSIS6 is mostly compatible with CMSIS5 - it is a better split and arrangement of the same components, so should be straightforward to migrate.