Hi Alan, thanks for the feedback!
Is there a way to support?
Hi Alan, thanks for the feedback!
Is there a way to support?
The LLVM Compiler Infrastructure Project Nov 8-9, 2022
Who is going?
Siva, Prabhu, Petr, others - would be nice to have a discussion in person.
Arm: Volodymyr will check, if anyone is going.
Action item for all to highlight to colleagues and suggest to join.
Volodymyr to cancel the call in the same week - done, removed from the Google calendar.
Background - link to slides in the agenda.
Recent discussion Linker generated attributes for LTO - does lld already do this?
Issues to address:
Cannot inline across output sections => add information about output sections to inform the linker.
How to place sections at particular addresses => extend IR with information about named sections to pass to LTO and take into account.
So changes are likely required in both IR and LLD.
There are known downstream implementations, but nothing in upstream yet.
Teresa (co-author of ThinLTO):
Many changes are not LTO specific - are required in other generic passes.
The patches from the 2017 presentation were not published, so need to start from scratch? Or check if anyone from Qualcomm can share the patches.
The changes are expected to be accepted upstream without major concerns.
Todd (author of the Discrose thread above): Work done at TI, IR is extended to implement mentioned features.
Teresa: would be easier to go without IR modification in the linker to keep the existing interface.
Todd: would it make sense to have a special “embedded” type of LTO?
Teresa/Petr: a recent relevant discussion about fat-lto-objects [RFC] -ffat-lto-objects support + WIP patch ⚙ D131618 [WIP][Do NOT review] LLD related changes for -ffat-lto-objects support
Arm linker wraps IR into ELF files to make library selection easier, but this seems to be a too specific solution for the Arm Compiler toolchain.
Discourse post in the agenda.
picolibc approach presented there can be reused for simple configurations and may be a good starting point for LLVM libc enablement.
Eventually it would be good to have a sort of template for different types of embedded targets.
Siva: Looks good. Now we need to provide the examples.
Question: Testing strategy for such code? We could use QEMU.
Any preparation for the LLVM Dev Meeting (The LLVM Compiler Infrastructure Project Nov 8-9, 2022) round table.
Reminder: there will be no sync up call in November because of LLVM Dev Meeting happening at the same time.
LLD features for embedded development.
Next steps for the libc++ discussion related to embedded use cases.
Any follow up on the previous topics on libc.
David and Kristof from Arm will attend the LLVM Dev Meeting.
No specific preparation for the round table discussed.
Arm team started working on some feature in LLD for embedded use cases:
Big endian was supported for AArch64, we are adding support for Arm for completeness.
CMSE (Cortex-M Security Extensions, TrustZone M) support for linking code for secure and non-secure worlds.
Patches will be upstream soon, reviews are welcome.
Looking at existing libraries and guidance targeting embedded use cases, the following are some of the most usual configuration options/needs:
Code size vs performance hand-optimized implementations.
Multiple small switches vs one embedded configuration that configures everything at once.
Fuchsia already uses multiple versions of libc++ with tweaks like above.
Actions? All: Try to implement and upstream such configuration options as people work on toolchains with specific libc++ configurations.
Pre-submit builders to check that the options still work are needed. Is this a prerequisite by maintainers anyway?
The plugin prototype presented by Mikhail was finished, so we know the info needed to make the whole approach work. Next question is what is the best way to define this info.
Still need to evaluate DSL options to capture the same info.
One of ideas is to have build attributes with partial order defined on them, so that the multilib logic can search and pick the best match. Fuchsia uses a similar idea of “priority” to pick up the best variant.
GCC spec files: can start small, but there is a risk to eventually grow and get the while language in (that is perceived as undesirable).
clang config files: There is a thread on Discourse about extending it so that eventually this may be enough, see [RFC] Adding a default file location to config file support - #45 by mgorny
I’ll ask this question here since I’ve mentioned MC/DC in the past and there is interest for the embedded community.
I am ready to put up a review for MC/DC (at long last!), but I’d like to create three individual reviews to make the process more manageable. There are 107 modified files, though the changes themselves aren’t so bad, and 50% of the touched files relate to testing.
However, the patches would need to be pushed together since they really are not independent of each other, and I want to know if there is precedent for that. I know phabricator has a concept of parent/child review, but the implication seems to be that each review patch can be pushed independently once it’s approved.
Also, I’m looking for reviewers who can assist. If you’re interested, let me know!
I recommend posting a separate thread to the list just in case it gets buried at the end of the thread
The most recent threads I can find on code coverage are:
These may be a good source for some authoratitive reviewers.
I’m willing to help for general comments, but I’ve personally no prior experience in the code-coverage parts of LLVM so I fall into the category of someone that might help but won’t be able to approve.
We’re going to have a round table for LLVM Embedded Toolchains Working Group at 2022 LLVM Developers’ Meeting on November 8th at 5:00-5:30pm.
At San Martin - Lower Level ?
Yes, I believe that all roundtables will be in that room.
Follow up from previous discussions in the WG and Arm brainstorming.
Slides: Clang multilib support early thoughts_LLVM Embedded Toolchains WG, 2022-12-08.pdf (477.3 KB)
Summary by Prabhu:
Google, Qualcomm, Nintendo, TI, … participated in the round tables related to embedded toolchains.
Downstream linkers → interested in sharing experience. Maybe a topic to follow up in the WG.
Security patches from TI may be shared upstream soon.
Nice to discuss embedded specific linker features and convince upstream maintainers they are useful, e.g.
Built in compression, e.g. for RW (copied from ROM to RAM and expanded).
Place a variable at a specific address, e.g. over a system register or IO ports.
When multiple banks of RAM are available, a linker needs a way to distribute segments across.
Linker script support in LLD (vs GNU) + support for embedded LTO.
Debuggability of linker scripts is not good - more errors/warnings/traces would be useful to understand the choices the linker made.
LLD related reviews:
Profiling working group for MC/DC patches from TI upsteaming review. There are three phab reviews related to MC/DC:
LLD CMSE implementation review ⚙ D139092 [RFC][LLD][ELF] Cortex-M Security Extensions (CMSE) Support
Next topics for the WG: would be useful to discuss and come up with a set of important linkers features for embedded to start promoting them upstream.
Multilib RFC is here: [RFC] Multilib
LLD key embedded features.
The prototype ⚙ D140959 RFC: Multilib prototype and RFC [RFC] Multilib
Layering of libraries - a new use case.
Petr Hosek on use cases:
Fuchsia: existing LLVM multilib implementation is used, no need to have multiple incompatible variants of libraries, mostly used for optimization like with/without exceptions or different ABIs (e.g. with sanitizers - instrumented libraries can be layered on top on non-instrumented as a fallback). Now multilib logic is hardcoded.
Pigweed: this is a traditional embedded, the use case is similar to LLVM Embedded Toolchain for Arm.
Can we come up with a way to unify these two use cases, even if some migration is needed to converge?
One vs multiple include directories: Do we need to rely on sysroots or not?
Fuchsia only needs one include directory: libraries use the same API, but different ABIs only.
No other issues suggested.
Can we have a layered header file includes similar to libraries described above? More specific first, generic then - now it is already used like that for multiarch support in libcxx.
Example, picolibc build system needed to be patched recently, because LLD has limitations in placing segments of memory, so we are running into practical issues.
There is a list of embedded linker features in the previous meeting minutes.
Volodymyr to reach out to LLD maintainer to arrange a discussion in one of the following sync ups.
Fuchsia team is comparing GNU LD vs LLD, there some known issues - can start a list in Discourse.
There was a discussion in the last LLVM Dev Meeting about LLD as well: diagnostic was mentioned as a major issue.
Google summer of code will be coming soon - LLD usability improvements can be a good fit.
Github - we can label relevant issues there to make them easy to find.
As mentioned on a couple of the embedded LLVM calls, my changes supporting MC/DC are presently in phabricator (quoted above):
Since the Developers’ Meeting last November, I’ve been hearing from more folks who are interested in seeing this functionality upstream but don’t have the LLVM expertise to contribute meaningfully to the reviews, unfortunately, so I could really use some help in getting things reviewed.
So far, @ellishg has been able to look at some of the back-end work and provide some good feedback. @smithp35 provided some good suggestions for the preliminary review I added, which I incorporated into the clang-specific review linked here.
Of course, I don’t want to trivialize the fact that everybody is busy, and many of you have upstreaming work of your own. I appreciate the feedback you have and whatever time y’all are able to contribute to this effort! I’m also on Discord if you want to chat about MC/DC.
Garrett Van Mourik
Vince Del Vecchio
LLD key embedded features by Peter Smith
Multilib implementation code reviews by Michael Platinigs
Other code reviews
Two major areas:
Observability/discoverability - more understandable output, better usability.
Disjoint memory regions: multiple memory banks with different properties => possible linker script extension to distribute code over multiple free spaces in different regions.
RW data compression - copy RW data from ROM to RAM and decompress, can save ROM => could add to LLD or have a separate utility. It is important that compressions and decompression algorithms match! Maintaining multiple algorithms may add to overheads.
Memory-mapped variables - placing a section at a particular address, e.g. to access IO ports directly.
GSoC project proposed for usability improvements [LLD] Linker Improvements for Embedded
In practice many issues are up to linker scripts issues (difference in behaviour of BFD vs LLD), thus being able to debug linker scripts easily helps a lot.
Disjoint memory - distributing by hand is very tedious, indeed.
Compression is helpful.
LTO support with embedded constraints of placement is another interesting area - there was a presentation by TI recently.
Another GSoC project idea is for machine readable format, e.g. JSON, for debug output (also link map, that is different between linkers now, thus tedious to parse) so that people can create their own visualizers/analyzers. Would be nice to convince the GNU community to implement the same format as well.
Demo by Peter Smith how the features mentioned above work in armlink (Arm proprietary linker).
armlink has 3 compression algorithms, one very basic run-length for 0’s which is already very helpful.
armlink supports placement attributes from C code, i.e. saves on manually editing linker script files (called scatter files for armlink).
armlink can show useful debug info like call graph/stack depth required, also breakdown of code/data sizes including the libraries to analyse code size issues.
armlink can trace symbols to show why a particular one was included.
⚙ D142933 Add -print-multi-selection-flags argument is about the proposed syntax for multi lib options not using actual command line names directly. It allows more limited, bit more stable API. Feedback is welcome!
Petr is reviewing and will get back with more feedback. Could we reuse tablegen here? May result in too much/complex dependencies.
We may consider making the feature experimental for the first LLVM release to allow later adaptation as per feedback from users.
Amilendra is working on feedback for the CMSE patch (⚙ D139092 [RFC][LLD][ELF] Cortex-M Security Extensions (CMSE) Support), also big endian support patch will follow.
Alan reminded about MC/DC code coverage LLVM Embedded Toolchains Working Group sync up - #22 by evodius96
Vince Del Vecchio
Multilib implementation code reviews by Michael Platinigs.
MC/DC implementation code review by Alan Phipps.
FatLTO by Petr Hosek.
Patches in review, few rounds of discussions happened and comments addressed.
One patch landed, 6 more to finish.
How to speed up or accept the current version with the intent to improve/address any issues?
Feedback form Petr:
The team reviewed the RFC in detail, the response will be posted on Discourse in coming days.
Suggestion: There are changes to internal API and adding new file formats (which are UVB - user visible behavior), so for internal changes it should be OK to land, UVB may need a bit more discussion.
Michael: Could/should we be more aggressive: accept a format now as an experimental feature, so warn that it may and likely will change in the future? May commit now, but review/refine before LLVM17 release to have it as stable as possible by the next release.
Peter: It would be nice to be able to give it a try with real projects and see if it works, rather than keep overthinking.
Agreed: Petr posts the response on Discourse, then if after the Discourse discussion there are no blockers, we commit the current format and try to refine it for LLVM17.
Petr: FatLTO is progressing, there is an RFC and patches will be available soon. Approach aligned with LTO maintainers.
The idea of FatLTO is for object files to contain information for both normal and LTO linking (i.e. binary and IR code).
TI presented a revised version of LTO for embedded/linker scripts recently, their solution is similar to/compatible with FatLTO.
Peter: Someone reported an issue with using LTO for embedded recently, see LLVM Embedded Toolchain for Arm issue Could you please include llvm-link, llc and opt? · Issue #187 · ARM-software/LLVM-embedded-toolchain-for-Arm · GitHub - they are using llvm-link, llc and opt manually to avoid the pitfalls of the default LTO.
Todd explained the details of the TI solution from the presentation - the two teams will talk to each other to further align the approach and implementation.
Peter: FOSDEM embedded developers were asking about a way to embed a section, e.g. a checksum, into the output image at the link time.
Petr: why is build-id not enough? Looks like something very custom/special.
Suggested that it would make sense to start a topic on Discourse to explain the use case, then consider possible solutions.
Peter: Use of TLS (thread local storage) in embedded projects. Picolibc uses TLS and initializes it in the linker script. The linker script and the library need to agree on the calculations of relevant addresses. LLD and GNU LD disagree on this - Peter is looking to create a reduced reproducer.
Is anyone using TLS in embedded apps? Vince: No, but had similar issues.
Is this going to change with C11 used more in embedded? Something to look out for in the future.
Peter will post an issue with the reproducer upstream.
Gulfem Savrun Yeniceri
Mandeep Singh Grang
Vince Del Vecchio
Multilib code reviews.
Other code reviews in progress.
Embedded profiling runtime. Include profiling lib? · Issue #197 · ARM-software/LLVM-embedded-toolchain-for-Arm · GitHub and Profiling contribution by rgrr · Pull Request #204 · ARM-software/LLVM-embedded-toolchain-for-Arm · GitHub
Building runtimes for bare-metal.
RFC and list of patches [RFC] Multilib - #5 by mplatings
Peter: The reviews are accepted by Arm, need confirmation from others in the community.
Petr: Will follow up on remaining reviews shortly.
Peter: A related question: In case there is a newlib installed from a distro package: how to make it work with clang?
Option could be to provide the config file to point there.
Could we inject an external multilib config file to use an existing set of multilibs?
Petr: There was a comment in the review that now the location of the yaml file is hardcoded - would be great to allow configuring it via a command line option, would solve this use case as well.
MC/DC status update here LLVM Embedded Toolchains Working Group call this Thursday 30th of March - #2 by evodius96
Two patches are progressing.
Review would be helpful for the clang support for MC/DC patch: ⚙ D138849 MC/DC in LLVM Source-Based Code Coverage: clang
Peter: A request raised for the LLVM Embedded Toolchains for Arm Issue #197
One option is to create a trivial runtime that would dump the counters somewhere as suggested in the issue discussion thread.
Wider question is how to add bare-metal support to the compiter_rt?
The PR Pull Request #204 suggests an implementation based on reusing compiler_rt pieces, which goes in the right direction, but only provides a very narrow Arm semihosting-specific implementation. How to generalise?
Can we provide an interface inside compiler_rt that can be used to tailor actual implementation of storing the data, suitable for bare-metal use cases as well?
Petr: The idea makes sense, the profile runtime is not in the best shape now, it would be great to refactor it and rewrite in C++. Would be good to have a header-only minimal implementation to allow easy reuse between actual implementations.
The team is very much interested in the implementation, but there was a lack of time to progress.
https://cs.opensource.google/fuchsia/fuchsia/+/main:src/lib/llvm-profdata/llvm-profdata.cc is an example of a minimal runtime we use for our kernel, we would like to break it up and upstream individual pieces so it can be reused for other embedded targets.
Exists a local patch in progress, the team will need help to progress it upstream.
Best way to start would be to do clean up/refactoring.
People who have downstream modifications - would be useful to know what kind of changes are there and why, i.e. how to refactor to accommodate for these? Examples:
Split of data to minimise the size of the resulting executable.
Size of counters: 32 vs 64 bits.
One runtime is used for both profiling and code coverage, thus maintains data for both - could be configurable.
Petr may post on Discourse a list of ideas for refactoring based on internal discussions.
A good topic to discuss in EuroLLVM 2023.
Mandeep: building libc++, libc++abi, libunwind, etc for bare-metal builds: is there a published how-to?
LLVM Embedded Toolchain for Arm is an example to learn from - see the top level cmake file. Newlib was supported in LLVM 13 and LLVM 14 builds, can be found in the source packages in the releases section.
Where to get binary libraries for RISC-V? Depends on the toolchains/vendor.
It may be of interest to people following this thread that LLVM Embedded Toolchain for Arm 16.0.0 has been released, including multilib. Feedback is very welcome either in the GitHub issues or in the multilib RFC.
(Suggestions also welcome if you think this such announcements are of interest in another channel/thread/category…)
Garrett Van Mourik
Vince Del Vecchio
Current discussion ([RFC] Multilib) is about options-to-libraries matching logic: so far agreed to use the normalised command line option for the architecture, we need to figure out a sensible way to match against it - regex or anything else.
Agreed the general preference to unblock and land the important patches, then get back to option printing and other possible improvements.
Note: Ordering of architecture options issue was also highlighted in the RISC-V call earlier today, so the issue is real and needs to be addressed in the design.
Petr: Google team provided all the useful information links in previous meeting minutes.
Next steps: Need a patch to start a more practical discussion.
Note: We need to keep the ABI stable, we may use a script to generate the list of public symbols, then check differences between versions. Petr suggested uploading the script for review/consideration, then it can be added to compiler_rt, if useful.
Thread https://discourse.llvm.org/t/lld-linker-section-packing/70234: GNU LD has a feature (the –enable-non-contiguous-regions flag that changes behaviour to auto distribute sections across matching memory regions).
Other toolchains have different approaches (syntax and semantics) to resolve this issue that is typical in embedded, because devices may have many types and many regions of memory, e.g. flash, static, dynamic memory, etc.
What is the best way to implement such in LLD?
Re-implementing LD logic in LLD might be a reasonable option. Would make compatibility between GNU and LLVM easier for projects that use both.
Can be promoted to some linker script file syntax instead of the command line option later.
The “fill till overflow, then switch to the next memory region” strategy seems to work best in practice (distributing evenly across memory regions makes local code from source scattered all over the memory which may have performance pitfalls).
Scott: (CircuitPython for AdaScript) needs:
Explicit marking for target region, e.g. what to put into flash or not including the whole call tree.
Access properties for memory region, e.g. place hot code into TCM memory.
Can we do much of it in the compiler, instead of linker? E.g. allocation to sections with specific properties. Alternatively, can be a standalone binary rewriting tool like bolt.
LLD has ordering based on profiling data feature, contributed for games optimization?
There is a symbol ordering file to control order, used by PGO already - would be best to reuse such existing features, if possible.
LLD why to avoid complexity in implementation?
Maintenance, especially the mix of different features not intended to work together originally.
Impact on performance of LLD - the more logic, the slower it is.
Need to check with the LLD maintainer if there are any objections to the LD feature to be reimplemented in LLD?
Daniel is happy to progress based on the discussion.
There was a question about the status of LLVM libc recently [libc] Is the llvm-libc incomplete? - would be interesting to discuss the use case/needs.
Volodymyr will suggest the author to join the next call.
Sorry, of course Adafruit - I know it very well! An artifact of typing and listening at the same time.
Garrett Van Mourik
Vince Del Vecchio
Follow up on the code reviews: multilibs, MC/DC coverage, etc.
Follow up on D150637 [lld][ELF] Add option for suppressing section type mismatch warnings progress and plan.
Follow up on the great discussions that happened in the EuroLLVM roundtable and agree on the next steps, see LLVM Embedded Toolchains - EuroLLVM 2023 round table summary
Multilibs - Michael updated as per the latest comments, thanks to Petr for the review and feedback to keep it moving.
MC/DC - update from Alan: thanks for the useful comments, patches to be updated soon.
Profiling runtime - no patches yet.
Petr: the team looked into this, refactoring is needed: the idea is to move the implementation to C++ incrementally. The team would like to start doing that, but need to make sure not to break the ABI. There is a patch in Phabricator that uses LLVM readelf with JSON output to extract all the API information and to do diff with the refactored one, so that it is possible to catch incompatibilities. There are some limitations, though: JSON output is currently only supported for ELF fil format.
In libc++ there are similar scripts to capture the ABI details, they are based on readelf and nm, they work well for dynamic libraries, but not static libraries. A Discussion started to generalize this libc++ infrastructure for other runtimes.
Another alternative could be llvm-ifs (shared object stubbing tool), but it does not support static archives either.
So there are a lot of tools, but each of them has limitations. So we need to decide on priorities/strategy. E.g. focus on ELF file format for now, then add the rest later; or first improve readelf to extend the support to other formats, then continue with the refactoring.
Note that the profile data format can change, there is a version embedded into the format itself. But this is not the issue, the discussion is specifically about ABI compatibility of the runtime itself.
May be a good idea to ask in Discourse who is using profiling with what OSes/formats. Darwin format is probably for Apple to check, COFF format may be for the Chrome team.
An alternative solution landed yesterday!
This LLVM Embedded Toolchains sync was advertised in the EuroLLVM as an extended roundtable - people were invited to continue the discussion in these sync ups. Specific topics of interest follow.
It might be a good idea to setup a real-time communication channel, e.g. a Discord - Volodymyr will try to do so.
How to advertise LLVM Embedded Toolchain more? Options considered: LLVM blog or a company blog? Invite people to comment on issues/needs/features they want to see for embedded use cases.
Another idea is to have talks in LLVM DevMeeting this fall. Google team want to present about porting a big project from GCC to LLVM. Issues the project run into and ideas to improve will be part of the presentation.
Similarly, Ties works on a blog about using LLVM Embedded Toolchain to target the Game Boy Advance game console. He wants to submit a talk for the LLVM Dev Meeting as well.
Scott highlighted that the CircuitPython team works on migration from GCC to LLVM and invited to help contribute - this is an open-source project, see Contributing - Pull Requests
Everyone agreed that the code size is definitely an issue, especially on smaller cores!
Petr suggested a possible future topic for discussion: analysis of optimization passes and how they contribute to the code size. There is an observation, that the Attributor pass with LTO gives a size reduction of about 10-12%, but it is not enabled by default. Proposal may be to enable it for -Oz? Enabling the Attributor pass may increase the compilation time, however compile time for embedded code (that is comparably small) is not that a big issue - may be a good trade off.
Related topic: Unified LTO discussion: the proposal to unify the ThinLTO and FullLTO. FullLTO is useful in embedded (again, smaller overall code size) vs ThinLTO for big apps like Chrome.
Quantum: Who has experience of using GCC LTO? Scott: it is used in CircuitPython from the very beginning - need to build it without LTO to see what is the impact.
Overlays in the linker. Arm Compiler has automatic overlays. embecosm attempt to standardise on ComRV (link in the trip report). It is driven by RISC-V community, but if it is interesting to a wider community, then we can collaborate.
Ties: LLD does not seem to support all the syntax from GNU LD, so using overlays was difficult.
Petr: Our project uses overlays that are reimplemented manually (not LLD one). LLVM and GCC do different things here, thus it is difficult to use their implementation. LLD implementation is not on par with GNU LD, e.g. cross refs checks that are controlled in the linker script for GNU LD (LLD does not even parse the relevant keywords).
Are GCC overlays usable (as the approach/design) or can we do better? Something more advanced would create a split between LLVM and GNU, thus we need to seek consensus with the GNU community.
ComRV may be one option to discuss - needs a deeper evaluation.
LLVM libc in embedded: Ther are some good news: it was tried in some projects and worked.
There as a migration project to replace gcc, newlib, libgcc, libstdc++ with LLVM compiler, LLVM libc, compiler_rt, and LLVM libc++.
Google team is working on a report to present in the LLVM Dev Meeting.
Some key issues: code size, e.g. printf is not configurable yet; memcopy size - improved, etc.
Now LLVM libc covers the needs of this particular project which is not that much: ~25 functions. The expectation is that for many embedded projects it is already usable - many projects use only a few functions.
Problem is that many embedded projects grow in complexity/size now and go closer to RTOS and using a lot of maths library, e.g. for DSP, thus become more demanding.
Single precision maths is complete in LLVM libc and is even better than glibc; double precision is in progress, but does not seem to be used a lot in embedded.
Petr suggested a possible future topic: malloc - LLVM libc uses scudo algorithm from compiler_rt as the default implementation. It is a good choice for desktop, but too big for embedded. Do we need a minimal malloc implementation for embedded? Exploring options and papers, etc.
Automotive community needs may be special here: they need deterministic memory management - would be good to make heap memory management pluggable so that people can replace depending on their use case.