LLVM Embedded Toolchains Working Group sync up

At San Martin - Lower Level ?

Yes, I believe that all roundtables will be in that room.

2022-12-08: Multilibs, LLVM Dev Meeting

Participants

  1. Peter Smith
  2. Michael Platings
  3. Michael
  4. Piotr Przybyla
  5. Rich Fuhler
  6. Simon Butcher
  7. Siva Chandra
  8. Yvan Roux
  9. Prabhu Rajasekaran
  10. Todd Snider
  11. Petr Hosek
  12. Volodymyr Turanskyy

Agenda

  1. Multilibs update.
  2. LLVM Dev Meeting follow up.

Discussion

Multilibs update by Peter Smith

LLVM Dev Meeting follow up

  • Summary by Prabhu:

    • Google, Qualcomm, Nintendo, TI, … participated in the round tables related to embedded toolchains.

    • Downstream linkers → interested in sharing experience. Maybe a topic to follow up in the WG.

    • Security patches from TI may be shared upstream soon.

Related:

  • Nice to discuss embedded specific linker features and convince upstream maintainers they are useful, e.g.

    • Built in compression, e.g. for RW (copied from ROM to RAM and expanded).

    • Place a variable at a specific address, e.g. over a system register or IO ports.

    • When multiple banks of RAM are available, a linker needs a way to distribute segments across.

  • Linker script support in LLD (vs GNU) + support for embedded LTO.

  • Debuggability of linker scripts is not good - more errors/warnings/traces would be useful to understand the choices the linker made.

LLD related reviews:

Next topics for the WG: would be useful to discuss and come up with a set of important linkers features for embedded to start promoting them upstream.

Multilib RFC is here: [RFC] Multilib

2023-01-05: Multilibs, LLD

Participants

  1. Michael Platings

  2. Michael Jones

  3. Nigel Perks

  4. Petr Hosek

  5. Siva Chandra

  6. Volodymyr Turanskyy

Agenda

  1. Multilibs update.

  2. LLD key embedded features.

Discussion

Multilibs by Michael Platings

  • The prototype ⚙ D140959 RFC: Multilib prototype and RFC [RFC] Multilib

  • Layering of libraries - a new use case.

  • Petr Hosek on use cases:

    • Fuchsia: existing LLVM multilib implementation is used, no need to have multiple incompatible variants of libraries, mostly used for optimization like with/without exceptions or different ABIs (e.g. with sanitizers - instrumented libraries can be layered on top on non-instrumented as a fallback). Now multilib logic is hardcoded.

    • Pigweed: this is a traditional embedded, the use case is similar to LLVM Embedded Toolchain for Arm.

    • Can we come up with a way to unify these two use cases, even if some migration is needed to converge?

  • One vs multiple include directories: Do we need to rely on sysroots or not?

    • Fuchsia only needs one include directory: libraries use the same API, but different ABIs only.

    • No other issues suggested.

    • Can we have a layered header file includes similar to libraries described above? More specific first, generic then - now it is already used like that for multiarch support in libcxx.

LLD key embedded features

  • Example, picolibc build system needed to be patched recently, because LLD has limitations in placing segments of memory, so we are running into practical issues.

  • There is a list of embedded linker features in the previous meeting minutes.

  • Volodymyr to reach out to LLD maintainer to arrange a discussion in one of the following sync ups.

  • Fuchsia team is comparing GNU LD vs LLD, there some known issues - can start a list in Discourse.

  • There was a discussion in the last LLVM Dev Meeting about LLD as well: diagnostic was mentioned as a major issue.

  • Google summer of code will be coming soon - LLD usability improvements can be a good fit.

  • Github - we can label relevant issues there to make them easy to find.

1 Like

Hi everyone!

As mentioned on a couple of the embedded LLVM calls, my changes supporting MC/DC are presently in phabricator (quoted above):

Since the Developers’ Meeting last November, I’ve been hearing from more folks who are interested in seeing this functionality upstream but don’t have the LLVM expertise to contribute meaningfully to the reviews, unfortunately, so I could really use some help in getting things reviewed.

So far, @ellishg has been able to look at some of the back-end work and provide some good feedback. @smithp35 provided some good suggestions for the preliminary review I added, which I incorporated into the clang-specific review linked here.

Of course, I don’t want to trivialize the fact that everybody is busy, and many of you have upstreaming work of your own. I appreciate the feedback you have and whatever time y’all are able to contribute to this effort! I’m also on Discord if you want to chat about MC/DC.

Thanks!

-Alan

1 Like

2023-02-02: LLD

Participants

  1. Petr Hosek

  2. Siva Chandra

  3. Prabhu Rajasekaran

  4. Peter Smith

  5. Tue Ly

  6. Fangrui Song

  7. Michael Platings

  8. Garrett Van Mourik

  9. Vince Del Vecchio

  10. Stan

  11. Henry Cox

  12. Yung-Chia Lin

  13. Zhi Zhuang

  14. Volodymyr Turanskyy

  15. Amilendra Kodithuwakku

Agenda

  1. LLD key embedded features by Peter Smith

  2. Multilib implementation code reviews by Michael Platinigs

  3. Other code reviews

Discussion

LLD key embedded features by Peter Smith

Two major areas:

  • Observability/discoverability - more understandable output, better usability.

  • Additional features:

    • Disjoint memory regions: multiple memory banks with different properties => possible linker script extension to distribute code over multiple free spaces in different regions.

    • RW data compression - copy RW data from ROM to RAM and decompress, can save ROM => could add to LLD or have a separate utility. It is important that compressions and decompression algorithms match! Maintaining multiple algorithms may add to overheads.

    • Memory-mapped variables - placing a section at a particular address, e.g. to access IO ports directly.

Comments:

  • Petr Hosek:

    • GSoC project proposed for usability improvements [LLD] Linker Improvements for Embedded

    • In practice many issues are up to linker scripts issues (difference in behaviour of BFD vs LLD), thus being able to debug linker scripts easily helps a lot.

    • Disjoint memory - distributing by hand is very tedious, indeed.

    • Compression is helpful.

    • LTO support with embedded constraints of placement is another interesting area - there was a presentation by TI recently.

    • Another GSoC project idea is for machine readable format, e.g. JSON, for debug output (also link map, that is different between linkers now, thus tedious to parse) so that people can create their own visualizers/analyzers. Would be nice to convince the GNU community to implement the same format as well.

Demo by Peter Smith how the features mentioned above work in armlink (Arm proprietary linker).

  • armlink has 3 compression algorithms, one very basic run-length for 0’s which is already very helpful.

  • armlink supports placement attributes from C code, i.e. saves on manually editing linker script files (called scatter files for armlink).

  • armlink can show useful debug info like call graph/stack depth required, also breakdown of code/data sizes including the libraries to analyse code size issues.

  • armlink can trace symbols to show why a particular one was included.

Multilib implementation code reviews by Michael Platings

  • ⚙ D142933 Add -print-multi-selection-flags argument is about the proposed syntax for multi lib options not using actual command line names directly. It allows more limited, bit more stable API. Feedback is welcome!

  • Petr is reviewing and will get back with more feedback. Could we reuse tablegen here? May result in too much/complex dependencies.

  • We may consider making the feature experimental for the first LLVM release to allow later adaptation as per feedback from users.

Other code reviews

2 Likes

2023-03-02: Code reviews

Participants

  1. Michael Platings

  2. Prabhu Rajasekaran

  3. Anton Rapetov

  4. Henry Cox

  5. Jason Liu

  6. Michael Jones

  7. Petr Hosek

  8. Simon Butcher

  9. Siva Chandra

  10. Stan

  11. Tue Ly

  12. Vince Del Vecchio

  13. Yung-Chia Lin

  14. Yvan Roux

  15. Alan Phipps

  16. Todd Snider

  17. Peter Smith

  18. Volodymyr Turanskyy

Agenda

  1. Multilib implementation code reviews by Michael Platinigs.

  2. MC/DC implementation code review by Alan Phipps.

  3. FatLTO by Petr Hosek.

  4. Other.

Discussion

Multilibs code review

  • Michael:

    • Patches in review, few rounds of discussions happened and comments addressed.

    • One patch landed, 6 more to finish.

    • How to speed up or accept the current version with the intent to improve/address any issues?

  • Feedback form Petr:

    • The team reviewed the RFC in detail, the response will be posted on Discourse in coming days.

    • Suggestion: There are changes to internal API and adding new file formats (which are UVB - user visible behavior), so for internal changes it should be OK to land, UVB may need a bit more discussion.

  • Michael: Could/should we be more aggressive: accept a format now as an experimental feature, so warn that it may and likely will change in the future? May commit now, but review/refine before LLVM17 release to have it as stable as possible by the next release.

  • Peter: It would be nice to be able to give it a try with real projects and see if it works, rather than keep overthinking.

Agreed: Petr posts the response on Discourse, then if after the Discourse discussion there are no blockers, we commit the current format and try to refine it for LLVM17.

MC/DC code review

  • Petr: Someone on the team is reviewing the patches, it goes a bit slower than wanted, but in progress, not forgotten.

FatLTO

  • Petr: FatLTO is progressing, there is an RFC and patches will be available soon. Approach aligned with LTO maintainers.

  • The idea of FatLTO is for object files to contain information for both normal and LTO linking (i.e. binary and IR code).

  • TI presented a revised version of LTO for embedded/linker scripts recently, their solution is similar to/compatible with FatLTO.

  • Peter: Someone reported an issue with using LTO for embedded recently, see LLVM Embedded Toolchain for Arm issue Could you please include llvm-link, llc and opt? · Issue #187 · ARM-software/LLVM-embedded-toolchain-for-Arm · GitHub - they are using llvm-link, llc and opt manually to avoid the pitfalls of the default LTO.

  • Todd explained the details of the TI solution from the presentation - the two teams will talk to each other to further align the approach and implementation.

Other

  • Peter: FOSDEM embedded developers were asking about a way to embed a section, e.g. a checksum, into the output image at the link time.

    • Petr: why is build-id not enough? Looks like something very custom/special.

    • Suggested that it would make sense to start a topic on Discourse to explain the use case, then consider possible solutions.

  • Peter: Use of TLS (thread local storage) in embedded projects. Picolibc uses TLS and initializes it in the linker script. The linker script and the library need to agree on the calculations of relevant addresses. LLD and GNU LD disagree on this - Peter is looking to create a reduced reproducer.

    • Is anyone using TLS in embedded apps? Vince: No, but had similar issues.

    • Is this going to change with C11 used more in embedded? Something to look out for in the future.

    • Peter will post an issue with the reproducer upstream.

1 Like

2023-03-30: Multilib, profiling runtime

Participants

  1. Gulfem Savrun Yeniceri

  2. Henry Cox

  3. Mandeep Singh Grang

  4. Michael Jones

  5. Nathan Sidwell

  6. Prabhu Rajasekaran

  7. Paul Kirth

  8. Petr Hosek

  9. Pierre

  10. Siva Chandra

  11. Stan

  12. Vince Del Vecchio

  13. Yung-Chia Lin

  14. Peter Smith

  15. Volodymyr Turanskyy

Agenda

  1. Multilib code reviews.

  2. Other code reviews in progress.

  3. Embedded profiling runtime. Include profiling lib? · Issue #197 · ARM-software/LLVM-embedded-toolchain-for-Arm · GitHub and Profiling contribution by rgrr · Pull Request #204 · ARM-software/LLVM-embedded-toolchain-for-Arm · GitHub

  4. Building runtimes for bare-metal.

Discussion

Multilibs code review

  • RFC and list of patches [RFC] Multilib - #5 by mplatings

  • Peter: The reviews are accepted by Arm, need confirmation from others in the community.

  • Petr: Will follow up on remaining reviews shortly.

  • Peter: A related question: In case there is a newlib installed from a distro package: how to make it work with clang?

    • Option could be to provide the config file to point there.

    • Could we inject an external multilib config file to use an existing set of multilibs?

    • Petr: There was a comment in the review that now the location of the yaml file is hardcoded - would be great to allow configuring it via a command line option, would solve this use case as well.

Other code reviews

Profiling runtime

  • Peter: A request raised for the LLVM Embedded Toolchains for Arm Issue #197

  • One option is to create a trivial runtime that would dump the counters somewhere as suggested in the issue discussion thread.

  • Wider question is how to add bare-metal support to the compiter_rt?

  • The PR Pull Request #204 suggests an implementation based on reusing compiler_rt pieces, which goes in the right direction, but only provides a very narrow Arm semihosting-specific implementation. How to generalise?

  • Can we provide an interface inside compiler_rt that can be used to tailor actual implementation of storing the data, suitable for bare-metal use cases as well?

  • Petr: The idea makes sense, the profile runtime is not in the best shape now, it would be great to refactor it and rewrite in C++. Would be good to have a header-only minimal implementation to allow easy reuse between actual implementations.

  • The team is very much interested in the implementation, but there was a lack of time to progress.

  • https://cs.opensource.google/fuchsia/fuchsia/+/main:src/lib/llvm-profdata/llvm-profdata.cc is an example of a minimal runtime we use for our kernel, we would like to break it up and upstream individual pieces so it can be reused for other embedded targets.

  • Exists a local patch in progress, the team will need help to progress it upstream.

  • Best way to start would be to do clean up/refactoring.

  • People who have downstream modifications - would be useful to know what kind of changes are there and why, i.e. how to refactor to accommodate for these? Examples:

    • Split of data to minimise the size of the resulting executable.

    • Size of counters: 32 vs 64 bits.

    • One runtime is used for both profiling and code coverage, thus maintains data for both - could be configurable.

  • Petr may post on Discourse a list of ideas for refactoring based on internal discussions.

  • A good topic to discuss in EuroLLVM 2023.

Building runtimes for bare-metal

It may be of interest to people following this thread that LLVM Embedded Toolchain for Arm 16.0.0 has been released, including multilib. Feedback is very welcome either in the GitHub issues or in the multilib RFC.

(Suggestions also welcome if you think this such announcements are of interest in another channel/thread/category…)

2023-04-27: Code reviews, LLD section packing

Participants

  1. Petr Hosek

  2. Alan Phipps

  3. Anton Rapetov

  4. Daniel Thornburgh

  5. Garrett Van Mourik

  6. Henry Cox

  7. Mhe

  8. Michael Jones

  9. Nathan Sidwell

  10. Scott

  11. Siva Chandra

  12. Stan

  13. Tue Ly

  14. Vince Del Vecchio

  15. Yung-Chia Lin

  16. Simon Butcher

  17. Peter Smith

  18. Volodymyr Turanskyy

Agenda

  1. EuroLLVM - any topics/preparations for the round tables?
  2. Multilib support - still some outstanding questions.
  3. MC/DC coverage - there were some review comments.
  4. Profiling runtimes - no patches yet, any more input as discussed last time?
  5. LLD Linker Section Packing

Discussion

EuroLLVM

  • Peter requested a roundtable for embedded toolchains, however there was no confirmation yet.

Multilibs code review (Peter)

  • Current discussion ([RFC] Multilib) is about options-to-libraries matching logic: so far agreed to use the normalised command line option for the architecture, we need to figure out a sensible way to match against it - regex or anything else.

  • Agreed the general preference to unblock and land the important patches, then get back to option printing and other possible improvements.

  • Note: Ordering of architecture options issue was also highlighted in the RISC-V call earlier today, so the issue is real and needs to be addressed in the design.

MC/DC coverage (Alan)

  • Comments were provided for all 3 patches, many thanks to those who contributed, updates are in progress - patches should be updated in coming weeks.

Profiler runtime

  • Petr: Google team provided all the useful information links in previous meeting minutes.

  • Next steps: Need a patch to start a more practical discussion.

  • Note: We need to keep the ABI stable, we may use a script to generate the list of public symbols, then check differences between versions. Petr suggested uploading the script for review/consideration, then it can be added to compiler_rt, if useful.

LLD Linker Section Packing (Daniel)

  • Thread https://discourse.llvm.org/t/lld-linker-section-packing/70234: GNU LD has a feature (the –enable-non-contiguous-regions flag that changes behaviour to auto distribute sections across matching memory regions).

  • Other toolchains have different approaches (syntax and semantics) to resolve this issue that is typical in embedded, because devices may have many types and many regions of memory, e.g. flash, static, dynamic memory, etc.

  • What is the best way to implement such in LLD?

  • Re-implementing LD logic in LLD might be a reasonable option. Would make compatibility between GNU and LLVM easier for projects that use both.

  • Can be promoted to some linker script file syntax instead of the command line option later.

  • The “fill till overflow, then switch to the next memory region” strategy seems to work best in practice (distributing evenly across memory regions makes local code from source scattered all over the memory which may have performance pitfalls).

  • Scott: (CircuitPython for AdaScript) needs:

    • Explicit marking for target region, e.g. what to put into flash or not including the whole call tree.

    • Access properties for memory region, e.g. place hot code into TCM memory.

  • Can we do much of it in the compiler, instead of linker? E.g. allocation to sections with specific properties. Alternatively, can be a standalone binary rewriting tool like bolt.

  • LLD has ordering based on profiling data feature, contributed for games optimization?

  • There is a symbol ordering file to control order, used by PGO already - would be best to reuse such existing features, if possible.

  • LLD why to avoid complexity in implementation?

    • Maintenance, especially the mix of different features not intended to work together originally.

    • Impact on performance of LLD - the more logic, the slower it is.

  • Need to check with the LLD maintainer if there are any objections to the LD feature to be reimplemented in LLD?

  • Daniel is happy to progress based on the discussion.

LLVM libc status (Michael)

  • There was a question about the status of LLVM libc recently [libc] Is the llvm-libc incomplete? - would be interesting to discuss the use case/needs.

  • Volodymyr will suggest the author to join the next call.

1 Like

:wave: CircuitPython for Adafruit

Sorry, of course Adafruit - I know it very well! An artifact of typing and listening at the same time.

2023-05-25: Code reviews, EuroLLVM roundtable

Participants

  1. Scott

  2. Anton Rapetov

  3. Garrett Van Mourik

  4. Henry Cox

  5. Michael Jones

  6. Nathan Sidwell

  7. Petr Hosek

  8. Quantum Thief

  9. Simon Butcher

  10. Stan

  11. Stefan Granitz

  12. Vince Del Vecchio

  13. Yung-Chai Lin

  14. Peter Smith

  15. Ties Stuij

  16. Volodymyr Turanskyy

Agenda

  1. Follow up on the code reviews: multilibs, MC/DC coverage, etc.

  2. Follow up on :gear: D150637 [lld][ELF] Add option for suppressing section type mismatch warnings progress and plan.

  3. Follow up on the great discussions that happened in the EuroLLVM roundtable and agree on the next steps, see LLVM Embedded Toolchains - EuroLLVM 2023 round table summary

Discussion

Code reviews

  • Multilibs - Michael updated as per the latest comments, thanks to Petr for the review and feedback to keep it moving.

  • MC/DC - update from Alan: thanks for the useful comments, patches to be updated soon.

  • Profiling runtime - no patches yet.

    • Petr: the team looked into this, refactoring is needed: the idea is to move the implementation to C++ incrementally. The team would like to start doing that, but need to make sure not to break the ABI. There is a patch in Phabricator that uses LLVM readelf with JSON output to extract all the API information and to do diff with the refactored one, so that it is possible to catch incompatibilities. There are some limitations, though: JSON output is currently only supported for ELF fil format.

    • In libc++ there are similar scripts to capture the ABI details, they are based on readelf and nm, they work well for dynamic libraries, but not static libraries. A Discussion started to generalize this libc++ infrastructure for other runtimes.

    • Another alternative could be llvm-ifs (shared object stubbing tool), but it does not support static archives either.

    • So there are a lot of tools, but each of them has limitations. So we need to decide on priorities/strategy. E.g. focus on ELF file format for now, then add the rest later; or first improve readelf to extend the support to other formats, then continue with the refactoring.

    • Note that the profile data format can change, there is a version embedded into the format itself. But this is not the issue, the discussion is specifically about ABI compatibility of the runtime itself.

  • May be a good idea to ask in Discourse who is using profiling with what OSes/formats. Darwin format is probably for Apple to check, COFF format may be for the Chrome team.

Suppressing section type mismatch

An alternative solution landed yesterday!

EuroLLVM

  • This LLVM Embedded Toolchains sync was advertised in the EuroLLVM as an extended roundtable - people were invited to continue the discussion in these sync ups. Specific topics of interest follow.

  • It might be a good idea to setup a real-time communication channel, e.g. a Discord - Volodymyr will try to do so.

  • Code and RFC reviews: It was highlighted that all patch/RFC comments are useful, even just to say that the idea sounds good, is useful, etc - helps to support the progress and builds confidence.

  • How to advertise LLVM Embedded Toolchain more? Options considered: LLVM blog or a company blog? Invite people to comment on issues/needs/features they want to see for embedded use cases.

  • Another idea is to have talks in LLVM DevMeeting this fall. Google team want to present about porting a big project from GCC to LLVM. Issues the project run into and ideas to improve will be part of the presentation.

  • Similarly, Ties works on a blog about using LLVM Embedded Toolchain to target the Game Boy Advance game console. He wants to submit a talk for the LLVM Dev Meeting as well.

  • Scott highlighted that the CircuitPython team works on migration from GCC to LLVM and invited to help contribute - this is an open-source project, see Contributing - Pull Requests

  • Everyone agreed that the code size is definitely an issue, especially on smaller cores!

  • Petr suggested a possible future topic for discussion: analysis of optimization passes and how they contribute to the code size. There is an observation, that the Attributor pass with LTO gives a size reduction of about 10-12%, but it is not enabled by default. Proposal may be to enable it for -Oz? Enabling the Attributor pass may increase the compilation time, however compile time for embedded code (that is comparably small) is not that a big issue - may be a good trade off.

  • Related topic: Unified LTO discussion: the proposal to unify the ThinLTO and FullLTO. FullLTO is useful in embedded (again, smaller overall code size) vs ThinLTO for big apps like Chrome.

  • Quantum: Who has experience of using GCC LTO? Scott: it is used in CircuitPython from the very beginning - need to build it without LTO to see what is the impact.

  • Overlays in the linker. Arm Compiler has automatic overlays. embecosm attempt to standardise on ComRV (link in the trip report). It is driven by RISC-V community, but if it is interesting to a wider community, then we can collaborate.

  • Ties: LLD does not seem to support all the syntax from GNU LD, so using overlays was difficult.

  • Petr: Our project uses overlays that are reimplemented manually (not LLD one). LLVM and GCC do different things here, thus it is difficult to use their implementation. LLD implementation is not on par with GNU LD, e.g. cross refs checks that are controlled in the linker script for GNU LD (LLD does not even parse the relevant keywords).

  • Are GCC overlays usable (as the approach/design) or can we do better? Something more advanced would create a split between LLVM and GNU, thus we need to seek consensus with the GNU community.

  • ComRV may be one option to discuss - needs a deeper evaluation.

  • LLVM libc in embedded: Ther are some good news: it was tried in some projects and worked.

  • There as a migration project to replace gcc, newlib, libgcc, libstdc++ with LLVM compiler, LLVM libc, compiler_rt, and LLVM libc++.

  • Google team is working on a report to present in the LLVM Dev Meeting.

  • Some key issues: code size, e.g. printf is not configurable yet; memcopy size - improved, etc.

  • Now LLVM libc covers the needs of this particular project which is not that much: ~25 functions. The expectation is that for many embedded projects it is already usable - many projects use only a few functions.

  • Problem is that many embedded projects grow in complexity/size now and go closer to RTOS and using a lot of maths library, e.g. for DSP, thus become more demanding.

  • Single precision maths is complete in LLVM libc and is even better than glibc; double precision is in progress, but does not seem to be used a lot in embedded.

  • Petr suggested a possible future topic: malloc - LLVM libc uses scudo algorithm from compiler_rt as the default implementation. It is a good choice for desktop, but too big for embedded. Do we need a minimal malloc implementation for embedded? Exploring options and papers, etc.

  • Automotive community needs may be special here: they need deterministic memory management - would be good to make heap memory management pluggable so that people can replace depending on their use case.

Dear all,

I am glad to let you know that we have got a dedicated LLVM Discord channel for more interactive discussions in between the sync ups, please see #embedded-toolchains under the Communities & Working Groups section or follow the direct link: Discord

2023-06-22: DevMeeting workshop, allocators, LTO

Participants

  1. Peter Smith

  2. Scott

  3. Henry Cox

  4. Alan Phipps

  5. Siva Chandra

  6. Petr Hosek

  7. Simon Butcher

  8. Prabhu Rajasekaran

  9. Simon Cook

  10. Tue Ly

  11. Volodymyr Turanskyy

Agenda

  1. Announcement: Discord channel for the working group created.
  2. Idea: Pre-LLVM-DEV’23 – Embedded Toolchains Workshop
  3. Technical discussion: [RFC] Allocators in the libc for the embedded use case
  4. Follow up on code reviews, previous discussions.

Discussion

Pre-LLVM DevMeeting workshop (Peter)

  • To be held the day before the conference. Duration to be clarified.

  • May look like an extended round table. Can have a couple of specific topics to deep dive, maybe even closer to a hackathon.

  • There are enough replies to the Discourse to continue with the proposal.

  • Can we have the usual round table during the DevMeeting days in addition? Yes, votes for both as not all people will be able to join the workshop earlier.

  • Request for tutorials What tutorials do you want to see at the LLVM Dev Meeting? - at least one was for embedded, Petr responded.

    • Petr and the team are thinking about a tutorial on coverage for embedded.

    • Another request was about build systems: may be able to cover building multilibs and runtimes.

  • Petr: supportive of the workshop, used to have similar sessions in the past, however to make it the most efficient it would be great to invite relevant maintainers, e.g. for LLD topics, so that the proposals can be discussed in person to save on online RFCs and comments. So it is even more important to choose the topics upfront.

Action: Peter to submit the proposal for the workshop.

Allocators (Siva)

  • The default Scudo allocator is large, thus there are many requests to add other options.

  • Want to start with something very simple and small (to fit embedded use cases), then enhance.

  • Plan: start with a simple implementation and put it up for review.

  • Peter: Option to override on the binary level would be great to allow people to substitute. Siva: sure, part of the original design to have it replaceable.

Code reviews

  • multilibs: just landed! Big thank you to Michael for driving and everyone who contributed to reviews!

  • MC/DC: goo feedback received - progressing.

Other

  • Scott: Large code size with LTO, see details in Discord (Discord):

    • Peter: Arm Compiler has -Omin option that modifies the LTO pipeline to avoid cross module inlining (that makes the code bigger). Can be a topic for discussion to suggest LTO pipeline for code size optimization.

    • Scott plans to also check per-section size difference - having map files in JSON format would be really good to enable comparison between clang and GCC.

    • Changes were required to make CircuitPython building with clang: Scott can share the experience/details with anyone interested.

  • Petr: We also experimented with LTO, there are places where LTO helps, but in others it makes it worse - not a straightforward experience. Now Fat LTO is used, Thin LTO does not help code size most of the time. Unified LTO discussion started that should be able to address (or enable addressing) these issue and design a pipeline useful for embedded code.

  • Petr: There are 10-15 issues related to LLD and embedded in the Fuchsia issue tracker, but most people are not aware of the tracker. The tea will upstream these issues in the LLVM project. Would be nice to label them to make it easier to find. Agreed to add the “embedded” label. Peter will check that we raise defects that we are aware of in Arm Compiler as well.

  • Prabhu: On the topic of binary comparison of files, would be nice to have a tool in upstream LLVM to compare binary sizes in a finer grained way. The “bloaty” tool exists (GitHub - google/bloaty: Bloaty McBloatface: a size profiler for binaries), but is not ideal either. llvm-size does some of it, probably may be extended as one of options.

2023-07-20

Participants

  1. Peter Smith

  2. Michael Platings

  3. Anton Repetov

  4. Michael Jones

  5. Nathan Sidwell

  6. Petr Hosek

  7. Prabhu Rajasekaran

  8. Scott

  9. Stan

  10. Yvan Roux

  11. Yung-Chia Lin

  12. Vince Del Vecchio

  13. Zhi Zhuang

  14. Garrett Van Mourik

  15. Volodymyr Turanskyy

Agenda

  1. Planning for the Pre-LLVM-DEV’23 – Embedded Toolchains Workshop.

  2. Follow up on code reviews in progress.

  3. Ideas/questions from Scott in Discord:

    • Memory region function attributes and how they’d impact inlining and output section.
    • Assembly inline with the source similar to opt-viewer, but be able to have gcc assembly alongside clang generated assembly.
    • Using Arm trace data as an input to PGO. That’d give high quality performance data without needing any instrumentation.
  4. AoB

Discussion

Pre-LLVM DevMeeting workshop (Peter)

NOTE: LLVM sync on 12th Oct will overlap with the LLVMDev meeting, so we will skip it.

  • Proposal submitted - did not hear back yet. Number of people requested ~25. There was a list of possible topics suggested - we will need to review and confirm topics and agree who can drive each of the topics.

  • News and next steps to be posted on Discourse when the workshop is confirmed.

Code reviews

  • Update from Alan Phipps on MC/DC: code reviews have been accepted, thanks for the help!

  • Michael P: libc++ with picolibc testing: code review accepted, expected to land soon, buildkite CI will test the picolibc (embedded) configuration of libc++ running in QEMU on Armv7-M.

  • Unified LTO, discussed previously, landed (RFC 2, patch a1ca3af 2) - impact/opportunities for embedded?

    • Unified LTO landed: thin or full LTP can decide on link time.

    • FatLTO: changes are mostly accepted and started landing, it may take a few more days to finish.

Code-size comparison (Scott)

  • opt-viewer style tool: Code comparison using objdump and llvm-objdump and debug info to match the output.

  • May be similar to LLVM performance testing: there is a system to use perf data to compare performance per building block between builds from different days.

Placement of code (Scott)

  • Function attribute to define memory region and copy depending function in the same memory region to be provided by the compiler.

  • Similar to what is needed for LTO to support placement in output sections. Do a pre-assignment of the output section before running the LTO itself. There was a link to the relevant presentation in the Discord channel: 2022 LLVM Dev Mtg: Link-Time Attributes for LTO: Incorporating linker knowledge into the LTO... - YouTube.

  • Automatic attribute propagation through the call graph is useful if there are libraries source code of which cannot be changed.

  • Somewhat similar to overlay logic to copy or not functions for different overlays.

PGO from traces (Scott)

  • PGO: trace capability of higher end CPUs - can it be used as input to PGO (without code instrumentation)? Branch instructions are most interesting to recreate the flow. Should be possible in principle. Arm Streamline is a trace based tool, armcc (Arm Compiler 5) was able to read its output, but not armclang (Arm Compiler 6).

  • There are a lot of trace formats out there so it could be tricky to parse all of them.

  • Compiler teams use a lot of models for testing, however for people working with peripherals there are less options.

Findings from migrating a hypervisor (Peter)

  • FIasco hypervisor (GitHub - kernkonzept/fiasco: The development version of the Fiasco.OC microkernel) has support for clang compiler, but not LLVM binutils.

  • Some issues found with llvm bin utils: llvm-objdump and llvm-objcopy have slightly different bahavior to GNU, which causes build issues.

  • Peter will raise upstream issue for these.

  • LLD: asserts in linker scripts - different behavior because of different time when the conditions are checked by LD vs LLD, thus build failure again.

FP modes in compiler_rt (Peter)

  • compiler_rt software emulation of floating point: rounding modes and flush to zero - who is interested in improvements? Having faster vs stricter IEEE modes. Arm can contribute.

  • Most of the time no-FP is used, thus limited experience and/or interest.

Embedded benchmarking (Petr)

  • What is a good set of benchmarks for embedded? embench (https://www.embench.org/)?

  • May be good to add something to LLVM test suite, if the benchmark is open-source.

  • Peter: Dhrystone and CoreMark, EEMBC are widely used, however they are mostly C (no C++).

    • CMSIS DSP, CMSIS NN can be used as application benchmarks, especially for SIMD.

    • embench was considered by the Arm team, however is not adopted for regular testing yet.

  • Scott: MicroPython has a set of benchmarks, can be seen as a more real world use case.

CMSIS clang support (Petr)

  • CMSIS is a dependency of a project the team is working on, but it does not support clang yet.

  • Volodymyr: CMSIS6 clang support is in progress: Core(M): Add support for LLVM/Clang · ARM-software/CMSIS_6@193243d · GitHub

  • There is no current plan to backport to CMSIS5, however both the clang enablement is a minor change and CMSIS6 is mostly compatible with CMSIS5 - it is a better split and arrangement of the same components, so should be straightforward to migrate.

Agenda

  • Dev meeting workshop
  • LTO support for linker scripts

Discussion

LTO Support for linker scripts

  • Matchers for named object files and named sections. LTO removes the original file from the input sections.
  • QC implementation presented (2017). Extended IR to include metadata.
  • QC restricting some inlining across different memory regions.
  • TI have a version, uses FAT LTO objects. Use ELF part to recreate named sections.
  • FAT LTO has landed for ELF, one or two pending patches such as update the documentation.
  • Motivation for using clang, in embedded systems is LTO and linker scripts.
  • Google leaning towards QC approach.
  • Can both solutions exist?
  • RFC next step
  • TI version adds attributes so that the compiler can influence.
  • TI and Google people will be at Embedded Toolchains workshop.

LLVM Embedded Toolchains workshop organisation

  • We have 3 hours, on the morning. With break times that gives us time for 2-3 45 minute session.
  • Will use birds of feather (BOF) format. 2 - 5 slides to set the context and then discuss.
  • Start with a proposal for sessions we would like to see and who we would need to join in.
  • Ask what AV equipment is available for recording/remote.
  • Have at least one person taking notes for each session.
  • Prefer topics where people are actively working in the area and need discussion on best approach to move forward.
  • Will organise via discourse thread. Topics chosen will be decided by those that turn up.

Raw notes for most favoured topic ideas so far.

LTO (see above)

Libc/Libc++?
Subsets of libc++ that we want/need?
Requirements for buildbots/support?
Need a good idea of what we want, invite libc++ maintainers?
Interaction between Libc and libc++

  • Use the most optimal API
  • Support within embedded systems
  • Porting layers? POSIX layers?
  • API underlying C11/POSIX

Requirements, what is the right layer of abstraction.

Coverage and profiling

  • Requirements for bare-metal
  • MC/DC other requirements
  • Performance profile.
    45 minutes

30 - 60 minutes more open topics, less well developed.
Laundry of list of LLD topics.

Afternoon workshop recommendation, Machine Learning in LLVM

One topic is using ML model for the inliner

  • Goal of the model is to reduce code-size.
  • Shows several percent size reduction in Oz.
  • Make the model more aware of embedded constraints.
  • Consider the stack size? Reducing the stack size.

Actions

@smithp35 to start out a Discourse thread with the proposed structure

Apologies for lack of attendance list, I didn’t write that down.

1 Like

2023-09-14 LLVM DevMeeting preparation

Participants

  1. Alan Phipps

  2. Anmol P. Paralkar

  3. Petr Hosek

  4. Prabhu Rajasekaran

  5. Michael Jones

  6. Nathan Sidwell

  7. Peter Smith

  8. Volodymyr Turanskyy

Agenda

  1. LLVM Developers’ Meeting preparation:

  2. Status updates/news regarding other topics from previous calls.

    • Code reviews.

    • Other topics.

Discussion

LLVM DevMeeting Preparation

Workshop (Peter)

  • Sign up for the workshop was good, around 30 people registered already.

  • The list of topics suggested in Doscourse Pre-LLVM-DEV'23 - Embedded Toolchains Workshop Agenda and Who's Coming?

  • Peter will create separate Discourse threads per topic to get volunteers to prepare some presentations/key points for the discussion.

  • Petr and Michael confirmed they would attend together with some other team members.

    • The team is looking to gather and share in the workshop input from other internal teams using LLVM libc++ and libc.

    • Also will share experience about using libc with LTO.

    • Note: There will be a talk in the main DevMeeting about Fuchsia team experience in using coverage - they can share more details/background in the workshop too.

Roundtable (Petr)

  • Call for roundtables is still open, we can propose a roundtable for the main DevMeeting event too: presumably, more people can attend and outcomes from the workshop can be shared.

  • Peter will submit the proposal for the embedded roundtable, he already requested one for PAuth so will need to make sure they do not overlap.

Code reviews (Nathan)

Multilibs layering issue (Peter)

LLD improvements (Prabhu)

  • Auto packing in linker scripts: tested different approaches, GNU LD seems to do what is needed, so it may be useful to add the same to LLD. A review may be posted in the next few weeks.

    • Response: Makes sense, compatibility is more important, than a slightly better, but different, feature.
  • LTO and linker scripts: The team looked at TI and Qualcomm proposals, but it turned out that the solutions are optimised for different use cases, thus the key question is: What are the different requirements that we want to prioritise and design for? Will discuss in the DevMeetig workshop.

    • Peter: some input from EuroLLVM discussions:

      • Try to minimise code changes before using LTO - users want a magic solution: just change the switches to make it work.

      • Support for segregation of different memory types/regions.

Code size optimization options (Petr)

  • A question from the last RISC-V LLVM working group sync up meeting: Zephyr team is trying clang, but noticed a big difference between -Os and -Oz compared to GCC.
    -Os in GCC is close to -Oz in clang: Does this match the experience of other teams?

  • Should we rename the options to match GCC behaviour to avoid the confusion?

  • Peter agreed that clang -Os is not focused on code size, but rather smaller, but still fast, -O2. -Oz can have performance impact that makes some people unhappy, it is “code size at all cost.”

  • Response: No major concerts with renaming.

  • Another idea was that it might be useful to have an optimization level to target embedded as there are some optimisation passes that are off by default, but are known to be beneficial for embedded use cases.

I’ve submitted an embedded toolchains roundtable for the dev meeting.

1 Like