This thread is dedicated to sharing the meeting minutes of the LLVM Qualification Working Group. We will use this space to publish summaries and action items from our monthly sync-ups, in order to keep the broader community informed.
Meeting notes are initially drafted collaboratively in a shared FramaPad and then archived here after each session for long-term reference and discussion.
The LLVM Qualification WG was formed following community interest in exploring how LLVM components could be qualified for use in safety-critical domains (e.g. automotive, oil & gas, medical). We welcome contributions from all perspectives: compiler developers, toolchain integrators, users from regulated industries, and others interested in software tool confidence, safety assurance, and systematic quality evidence.
If youâre interested in participating or following along, feel free to join the discussions here or connect via the LLVM Community Discord in the #fusa-qual-wg channel.
Warm regards,
Wendi
(on behalf of the LLVM Qualification WG)
Approaches to Linking Tests and Requirements: Jessica discussed different methods for associating tests with requirements, such as adding text to existing tests or creating a directory to reference them. She suggested that adding text to the tests might be the most practical initial step. Wendi noted this down as a potential solution.
Leveraging Existing Specifications and Tests: Wendi inquired about existing specifications from the C/C++ working group. Jessica mentioned that implicit requirements might already exist in the test directory where clangâs behavior is checked for specific code. Jorge suggested utilizing golden samples as tests and mapping them to requirements using LLVMâs existing testing infrastructure.
Command Line Option Testing: Mikhail proposed checking which command line options are used by which tests during test suite runs to ensure all specified options are tested. Wendi asked about typical requirements management practices, noting the need for unique IDs and clear, verifiable descriptions.
Requirements Management Tools: Wendi shared links to free and open-source requirements management tools, mentioning Basil. Oliver described a similar tool used for tracing links between requirements, tests, and other elements, capable of generating coverage reports. Jorge offered to investigate in the Eclipse SDV/S-Core group what they use for requirements.
Automating Traceability: The discussion addressed automating the mapping between specifications and tests, either within the tests or using a requirements management tool. Oliver described how traceability tools use commands with IDs to link requirements to various artifacts and check for coverage.
Scope and Maintenance of Specifications: Wendi raised the question of what should be specified, by whom, and how it can be maintained, initially suggesting a focus on clang and C++. Oliver cautioned that this task should not be underestimated due to the potential workload. Jessica suggested that while specifying every compiler transformation might be difficult, existing tests could catalog behavior, and tools like Alive2 could verify semantic equivalence for certain optimizations. Petar emphasized the potential enormity of the effort required for detailed specification and maintenance.
Black Box Testing and Trust: Petar shared experience from qualifying GCC by treating it as a black box and using clang for result comparison. They suggested that a similar approach of extensive testing might be necessary for LLVM, focusing on building trust rather than deep internal analysis. Oliver seconded this, noting the difficulty of qualifying the Linux kernel through code analysis and suggesting the possibility of safety monitors or limiting the scope of usable compiler options. Jorge mentioned that qualified commercial compilers often come with safety manuals and usage guidelines.
Qualification of Standard/Runtime Libraries and Linker: Petar inquired about the qualification of the standard library. Jessica suggested working on libraries after addressing clang. Mikhail expressed interest in qualifying open-source runtimes. Wendi noted this point for future discussion.
Next Steps and Continued Discussion: Wendi suggested continuing the discussion on the Discord channel or Discourse and encouraged participants to add their thoughts to the notes.
US/Canada
Proposed Work Breakdown for Qualification: Wendi presented a suggestion coming from JR Simoes to split the qualification work into three parts: front-end, middle-end, and back-end, with an initial focus on the C/C++ front-end due to its broad use in safety-critical applications. Noted that other languages like Rust and relevant tools could be included later. Key discussion points for confidence in use of the compiler include specifications, testing, formal verification, sanitizers, runtime diagnostics, quality of compiler inputs, known issues analysis, and documentation (user manuals, safety manuals, release notes). The qualification of standard and runtime libraries was also added as a topic based on a suggestion during the EU/Asia session.
Challenges in Defining and Tracing Specifications: Wendi highlighted critical questions regarding specifications: whether existing specifications for front-ends like Clang can be reused, how to define partial specifications if none exist, and how to trace specifications with existing open-source verification means, such as the 25,000 tests, to achieve bidirectional traceability. Posed questions about the ownership and maintainability of these specifications. Opinions from the EU/Asia session suggested annotating tests with unique IDs linked to requirements or grouping tests into directories associated with specific requirements.
Recommendations for Specification and Test Organization: Florian shared his experience with Rust, suggesting that specifications should reuse as much as possible from existing project documentation and be built in a format conducive to linking, such as HTML. Oscar pointed out that C/C++ benefits from existing ISO standardization documents, so the focus should be on LLVM-specific features rather than creating new language specifications. Both Florian and Oscar agreed that structuring tests in directories that mirror the C standardâs chapters and sub-chapters is a practical and accepted approach for C/C++ compiler qualification, making maintenance easier for both safety-critical and non-safety-critical maintainers.
Completeness Argument and Requirements Management Tools: Oscar emphasized the need for a âcompleteness argumentâ in qualifying open-source software, explaining that beyond code coverage, it is essential to demonstrate why test cases are sufficient, often by using equivalence classes and programming constructs to define comprehensive test strategies. Wendi inquired about the use of free or open-source requirements management tools. Oscar indicated that he uses a proprietary model for linking functional specifications to test cases.
Experiences with Requirements Management Tools: Florian shared his experience, stating that while tools like Sphinx needs are useful for general software and libraries, programming language specifications are too dense for typical requirements writing tools. They opted for a custom Sphinx extension for language specification to test tracing, finding it more suitable than trying to adapt existing tools not designed for this specific task. Pete supported this, noting that Sphinx and Sphinx needs were adopted for the Rust coding guidelines within the safety-critical Rust consortium, finding them useful for building verifications and ensuring traceability.
Feature-Based Qualification Structure: Oscar suggested structuring the qualification based on logical features (e.g., language compliance, optimization rules, target-specific features) rather than technical components like front-end, middle-end, and back-end, as this logical structure is more relevant for certifiers who may not understand internal compiler architecture, and that tool qualification is typically a black-box activity. John clarified that the split front-end/middle-end/back-end is driven by the capabilities of existing formal verification tools like Alive2 and their translation validation work, which currently can validate optimizations on the middle-end and back-end but not the front-end. Oscar and Wendi agreed that tools used for tool qualification do not need to be qualified themselves, simplifying the process. Florian expressed interest in separately qualifying linkers, but Oscar argued that qualifying a tool always involves documenting its environment and configuration, making it an integrated process.
Actions
Wendi:
Create calendar for the group, with regular invitations to the sync-ups
@regehr introduced Alive2, a software tool for refinement checking of LLVM optimizations, and the arm-tv tool, developed by his group, for translation validation of ARM 64-bit assembly code, explaining their methodologies and demonstrating their application in bug detection. While arm-tv has found 46 bugs, primarily silent miscompiles, scalability challenges, particularly with memory access, were acknowledged. Some questions were raised about limitations during lifting, toolâs trustworthiness, adding new architectures.
Details
Introduction to Alive2@regnerjr introduced Alive2, a software tool for refinement checking of LLVM optimizations. He explained that LLVMâs middle-end rewrites Intermediate Representation (IR) to improve code, often making it faster or smaller. These transformations are considered ârefinements,â meaning the new codeâs set of meanings is a subset of the old codeâs. Alive2 uses symbolic execution of code before and after optimization and generates questions for the Z3 theorem prover to verify if the optimized code refines the unoptimized code.
Alive2 Compiler Explorer@regehr encouraged attendees to try Alive2 via its compiler explorer instance at alive2.llvm.org, noting its ease of use and providing an example problem to explore. He also mentioned that papers have been written about Alive2, but hands-on use is likely more engaging.
arm-tv** overview** @regehr presented the arm-tv tool, developed by his group, which performs translation validation for ARM 64-bit assembly code. He demonstrated an LLVM function that uses `memcmp` and showed how the ARM backend optimizes it, including inline substitution of `memcmp` and replacing control flow with a conditional select. The arm-tv tool aims to prove that the assembly code is a faithful translation of the LLVM IR.
Translation Validation methodology@regehr explained that translation validation involves assigning a mathematical meaning to the code before and after transformation. Alive2 is used to formally represent the meaning of LLVM functions. For ARM code, arm-tv assigns meaning either by using hand-written instruction semantics derived from the manual or through a mechanically derived version from ARMâs formal description of instructions. The tool then translates the ARM code back into LLVM IR and invokes Alive2 for a refinement check.
arm-tv** in action** @regehr demonstrated arm-tv, which is called backend-tv and also supports RISC-V. The tool parses assembly into LLVM MC inst, lifts the ARM assembly code by building a small execution environment that resembles an ARM processor with registers initialized with âfreeze poisonâ (an indeterminate bit pattern), and then processes the lifted instructions. This process results in a clumsy but optimizable function that Alive2 can then efficiently check against the original code.
Bug detection with arm-tv@regehr shared that arm-tv has found 46 bugs, primarily silent miscompiles, most of which are in the machine-independent parts of the LLVM backend. He noted that while arm-tv recently started supporting RISC-V, fewer bugs have been found compared to ARM, attributing this to the multi-backend impact of the existing bugs. @regehr mentioned that most bugs were found with the help of fuzzers and an automated testing workflow.
Origin and scalability challenges@regehr revealed that the impetus for arm-tv came from a conversation with JF Bastien years ago about trusting LLVMâs top-of-tree for automotive applications. @YoungJunLee inquired about handling large functions more efficiently, to which @regehr acknowledged scalability as a significant weakness of the tool, particularly with memory access, indicating that improvements to Alive2âs memory encoding are needed.
Limitations and trustworthiness@uwendi asked about limitations or loss of information during lifting. @regehr explained that while ARM assembly semantics are cleaner, challenges arose in lifting code with powerful pointers to LLVMâs weaker object-offset model, necessitating changes to Alive2âs memory model to support âphysical pointersâ. He addressed concerns about trusting arm-tv, suggesting documenting the toolâs scope and limitations, with a separate group of people needed to verify its implementation for certification purposes.
Tool Usage and Bug Reporting@regehr stated that currently, only his team uses arm-tv. When a bug is reported by the tool, he verifies it on an actual ARM machine to confirm the misbehavior before reporting it to the LLVM developers, ensuring the toolâs output is vetted. He also mentioned the existence of false alarms due to the complexity of the LLVM memory model.
Impact on LLVM specification and Future Work@regehr shared an anecdote where arm-tv uncovered an ambiguity in the interaction between the LLVM Lang Ref and the AR64 ABI document, which led to a resolution and fix in LLVM. Regarding future work, he expressed interest in supporting translation validation of inline assembly and concurrency-related aspects of LLVM IR, such as volatile accesses and interrupt handlers in embedded systems.
Adding new architectures Luc Forget inquired about the modularity of arm-tv for adding new ISA semantics. @regehr explained that while not âsuper modular,â refactoring had made it easier to add RISC-V support, and adding a third architecture would likely not be difficult, though Alive2âs lack of multiple address space support remains a limitation for GPU backends. He also highlighted that supporting a new architecture primarily requires a description of its instruction set. @regehr mentioned that for ARM, they can automatically generate the instruction semantics from ARMâs Architecture Specification Language (ASL), but for RISC-V, it was done by hand. He hopes to derive x86-64 semantics automatically in the future, as manual implementation is too extensive.
Discussion: Proposed changes to membership criteria to address the current internal processâ inherent challenges with active collaboration and contribution.
Current Status: Overview of Clangâs test suite (clang/test/cxx) and conformance challenges.
Discussion: How these insights impact LLVM Qualification Groupâs goals, explore possible steps on creating better traceability and conformance for Clang.
Open Floor
Any additional topics, questions, comments, or suggestions from group members.
Meeting Kick-off and Participant Introductions: as a previous step to discussing membership criteria, introduction of each attendee, sharing background and interest in the LLVM qualification group.
AI in Software Development and Qualification: Carlos and Oscar discussed the role of AI in software development, particularly in the context of ISO 26262 compliance. Oscar mentioned a study where AI tools were classified as TCL1 due to uncertainties in their qualification, unlike other tools often classified as TCL3, emphasizing the human ability to detect errors. Carlos expressed skepticism about AI-generated code making it into production for critical software within the next decade due to liability issues and AIâs current limitations in understanding broad code context, which was supported by an experiment showing AIâs failure to recognize dependencies.
Accelerators and Safety Critical Spaces: Erik focuses on high-performance computing and runtimes, and bringing this technology to safety-critical spaces like automotive. They clarified that their work involves certified runtime components that depend on LLVM for qualification, positioning themself more as a runtime specialist rather than a compiler expert in this context.
Open Source Static Analysis Tools and Legal Challenges: Petar shared their experience in trying to open-source a static analysis tool for automotive standards like MISRA and AUTOSAR. They explained that legal issues, particularly concerning the exact wording of error reports and the reuse of standard parts, prevented the toolâs public release, despite having presented it five years prior. Davide affirmed a similar experience, noting that MISRA and AUTOSAR checks cannot currently be open-sourced, highlighting the legal complexities involved.
Discrepancies in Open Source Standards Access: Oscar expressed surprise regarding the difficulties with open-sourcing AUTOSAR-related implementations, as AUTOSAR specifications are freely available, unlike MISRA documents which require payment. Petar clarified that while AUTOSAR standards are free to download, reusing parts of them requires written permission from the consortium, which has been difficult to obtain. This discussion underscored the legal and logistical hurdles in leveraging open-source initiatives for automotive industry standards.
LLVM Component Qualification by Validas: Oscar detailed Validasâs experience in qualifying LLVM components, including LLVM-based compilers and clang-tidy. They highlighted the usefulness of clangâs feature to log optimization rules for qualification purposes and also mentioned their qualification kit for clang-tidy, which requires qualifying each rule individually. Additionally, Oscar noted their ongoing process of qualifying the STL template library, having identified and contributed fixes for issues in its implementation.
Compiler Optimizations and Safety Concerns: Petar raised a question about âwrongâ optimizations in compilers, stating that as a compiler developer, they see nothing inherently wrong with optimizations and that issues are typically bugs, not inherent flaws in optimization. Oscar provided examples of optimizations that can lead to incorrect or unexpected behavior, such as integer overflow issues or deviations in floating-point calculations due to differences in host versus target accuracy. The discussion emphasized the need for careful configuration and understanding of compiler behavior in safety-critical contexts to ensure deterministic output.
Managing Known Bugs in Open Source Tools: Oscar discussed the importance of managing known bugs in open-source tools for qualification purposes, noting that the existence of bugs is acceptable as long as workarounds are available. They suggested that improving the classification and mapping of known bugs to specific features would significantly aid in filtering and scanning for relevant issues, making the analysis process faster and easier for developers.
Internal Process Changes and Membership Criteria: Wendi briefly introduced a proposal for changes in the groupâs internal process, including membership criteria and participation expectations. They shared a link to the detailed description, emphasizing the need for clear expectations regarding contributions and acknowledging the limited time and bandwidth of participants.
Valuing Small Contributions: Wendi emphasized the significance of small contributions, stating that even a few minutes or one hour per month dedicated to the group would be meaningful and important. They encouraged attendees to review and comment on the shared document, noting that it was a lightweight version of the security response definition of group composition.
RFC Summary and Offline Review: Wendi shared a link to a summary of the main points from an RFC written in April 2023, which is related to Clang conformance. They requested that participants review it offline and share their opinions on Discord rather than the Discourse forum.
US/Canada
Proposed Internal Process for the Group: Wendi presented a proposal for a new lightweight internal process to address concerns about group efficiency and the need for a more structured approach. They highlighted the importance of recognizing and respecting membersâ limited bandwidth and valuing small contributions, as some members might have mistakenly believed that only full-time commitment was expected.
C++ Conformance Testing Challenges: Wendi shared insights from their contact with the Clang C/C++ working group regarding Clang conformance specifications and testability. An RFC from April 2023 indicated that developing a C++ conformance test suite faced resource limitations, preventing any current action despite a good description of how it could be done. A significant hurdle was the licensing issues with test vendors, as they only allowed reporting pass/fail results but not opening tests to analyze failures, making error analysis impossible for open-source use.
Clang CXX Directory and Defect Reports: Vlad elaborated on the `clang/test/CXX` directory, noting its two main parts: `DRs` (defect reports) and everything else. They maintained the `DRs` section, which contained about 700 tests for defect reports, far exceeding other implementations. Vlad mentioned that much of the work in this directory, particularly the first 600 defect reports, was done around 2014 by Richard Smith, but progress stopped after that.
Challenges with External Conformance Test Suites: Vlad explained that efforts to use external test suites like those from Perennial and Plum Hall in Clang were unsuccessful due to restrictive licensing, which would essentially require these companies to forfeit their business. They also mentioned that some of these test suites were not ideal and could even contain bugs. Wendi confirmed similar issues with SolidSands, stating that it was difficult to use such suites in an open-source context.
C++ Standard and Compiler Conformance: Vlad discussed the historical decision not to include many C++ examples in the standard, which created long-term issues for language evolution and caused increasing disagreement among implementations, especially for newer features. They emphasized the RFCâs primary goal: to find a way to write and maintain a test suite that avoids decay, proposing tracking the git repository of the draft to reflect updates to the standard in updated tests. Vlad explained that compilers often do not conform to published standards due to subsequent defect reports and accepted papers, citing the ârelaxed template template parameterâ debacle as an example.
Private Compiler Qualification and Test Suite Quality: Wendi inquired how companies privately qualify their LLVM-based compilers given the lack of proof of conformance to standards. Vlad expressed skepticism about the quality of such private test suites, stating that Clang itself does not claim full conformance due to known unresolved issues, such as the incomplete implementation of the 2019 name lookup paper. Vlad also detailed challenges with âcomplete class contextâ rules, where compilers are expected to handle dependencies correctly but often do not due to performance concerns, making it difficult for external parties to fix.
Current Status of RFC: Wendi summarized the RFCâs status, confirming with Vlad that nothing had changed regarding the feasibility of in-house effort or external conformance test suites due to resource and licensing issues. Vlad stressed the need for ongoing communication with the core working group to correctly interpret the standard and identify parts considered âgarbageâ that require rewriting.
Safety Critical Rust Consortium: Pete discussed their work with the Safety Critical Rust Consortium, which aims to identify and address gaps in the Rust ecosystem and language for safety-critical applications. They explained that the consortium seeks to enable Rustâs use in more safety-critical industries and at higher safety criticality levels. Vlad raised concerns about the completeness of Rustâs specifications, noting that the specification for name lookup was unclear. Pete acknowledged that Rust, as a less mature language, had gaps in its documentation, but efforts were underway to improve the reference and FLS documents.
Actions
All participants: review description of the proposal of internal process update, share thoughts on the Discord channel, and add comments or modify directly in the FramaPad for improvement.
All participants: review the Clang conformance summary and send feedback and questions on the Discord channel. The Clang C/C++ WG members are open to answer to our questions and concerns.
Wendi: try to find a conversation with Robert C. Seacord about Plum Hallâs test suite and update Vlad.
Ahead of the call, Iâd like to invite you to drop a quick message on Discord about the offline reviews we talked about last time (see also the minutes):
Insights from a conversation with an ELISA project member on resources & funding
Given that time is short, I may also create separate Discord threads to keep these discussions moving more efficiently.
Thanks again to everyone who answered the form. @etomzak Youâre warmly welcome in our calls, even if your availability is limited.
Small note: at the moment, @evodius96 is officially the only member from US/Canada time zones. @PLeVasseur is interested and expected to join sync-ups, so just letting you know for context.
Hi Wendi, due to travel I wonât be able to attend this upcoming call. If there is a better time for everyone, please donât hesitate to make a change. Thank you!
Hi @evodius96, thanks for letting me know! Since you wonât be able to attend, Iâll cancel the upcoming call. Weâll keep the EU/Asia sync-up as the main source of updates this time, so Iâd kindly ask you to have a look at the minutes afterward to stay in the loop. Looking forward to catching up with you in a future call once youâre back from your travels. Safe travels!
For our upcoming sync-up (tomorrow), we have more items on the agenda than we can realistically cover in one hour. Hereâs a draft of the presentation (the final version will be uploaded to GitHub after the sync-up):
To make sure we use our meeting time efficiently, and to give everyone a fair chance to contribute, Iâd like to suggest that we handle some of the non-technical topics asynchronously on our Discord channel.
Communications & outreach (conferences and meetups)
Resources & funding (initial conversation inspired by ELISA project)
By shifting these items to Discord, weâll free up the sync-up call to focus on technical discussions (e.g. directions for a grey-box approach, tool usage confidence, evaluation of development processes).
Outcomes from Discord discussions will also be summarized here in our meeting minutes on Discourse so nothing is lost. Looking forward to your thoughts and contributions on Discord!
One shared concern about AI writing down every word (from a non-member)
Gemini not enabled today
New self-nomination through the Google Form - Zakyâs presentation
EE student from Indonesia
Coming as an individual
Working with ISO/SAE 21434 (cybersecurity)
Oscarâs idea for the US LLVM 2025 conference (end of October)
Proposal to have a corner about compiler qualification at the exhibition for sponsors
Discuss with people and attract interest on it
No conclusion about this point, to be taken for discussion to Discord
Technical topics
Wrap-up about direction and focus of the discussions since July
Reference functional safety standard
Members from several industries (automotive, trains, robots, etc), so different functional safety standards apply
General framework for functional safety of E/EE/PE systems is IEC 61508, so makes sense to use it as first guidance
As IEC 61508 is parent of other functional safety standards, the expectations around tool confidence are very similar
Need to provide evidence of tool usage
Three questions from the safety standards (see slides)
If answers Yes - Yes - No, then there is a need to provide the evidence
Comments about question 3:
Most safety standards are written for users, so it depends on how much they examine the ârelevant outputsâ
In the case of a compiler, relevant output â final executable
More and more difficult to thoroughly verify the final executable (complexity)
Many of the tools that are traditionally used by vendors are closed, some open tools that can be used to check the relevant outputs
First target: Clang compiler
As a tool provider of Clang, we donât know what will be the usage
As a tool user, you can restrict yourself (for example, using it only for debugging, not for mass-production)
Users will need to rely on the compiler depending on the usage
All the C++ parsing and semantic analysis is done by the Clang frontend
Language + Standard => Version changes are fast
Which flavor of C or C++?
C++ spec improves significantly
C spec is more rigorous
Suggestion of small scope:
Limit to the lexer? Spec wise, it is more simple
Opinion 1: use of restricting to lexer is limited; from safety point of view, trust the lexer but what about the rest; requirements and association to what use cases
Opinion 2: agree, need of a valid use case for the lexer
About effort for a conformance test suite:
Opinion 1: Amount of effort would be huge even for 1 version
Opinion 2: Testing is laborious but not very hard
Opinion 3: If you want to do a good conformance test, bottleneck is interpretation of the standard; testing specification against C/C++ is not as with Rust
Comment: commercial test suites are expensive, 40-45K Euro to qualify only one version of a compiler
What is generated is version dependent
About usage of Alive2:
Replace the Clang front-end with Alive2 front-end and generate Alive2 IR from source code?
Clarification: alivecc doesnât replace Clang itself; it simply adds a pass plugin for verification at the IR transformation stage
Grey-box approach
Qualification is typically black-box activity
Disadvantage: to be done for every combination, optimization options, etc
Grey-box approach could be useful, but one limitation is lack of specification of intermediate I/O
Example: specification of the IR
Identification of regressions in IR could be useful
Just a quick update: Iâve submitted a PR to update the documentation and add links to the August and September 2025 sync-up slide decks, which helped guide our recent discussions:
The slides are currently hosted in llvm-project/docs/qual-wg/slides, but following feedback, I plan to migrate them to a more appropriate location (likely llvm-www) once confirmed with the community. Please feel free to check the PR for details, and let me know if you have any feedback!
Gemini notes taken in the EU/Asia meeting, modified by @uwendi
Non-technical topics
Meeting Logistics and New Members
Slides shared on Discord before the meeting
Addition of three new members
Updated member list has been added into the official group webpage (PR not merged yet)
Challenges in Decision-Making
Difficulties in making decisions within the group
Cultural factors, challenges in reaching consensus, and time limits as primary issue
What does consensus means for us, what constitutes it
Find compromise, common understanding
Clear when people have different opinions, but for people who are not giving an opinion, how can we know?
Divided positions - What happens when 50/50?
Consensus could mean no objections are raised, or majority rule if participation is lacking
Engagement in early conversations and reasoning are necessary, but for clear yes/no decisions, voting with a >50% threshold should be used, excluding non-votes from the percentage => If someone doesnât vote, we cannot count the vote
If there a âgrayâ area, then maybe discuss it in next sync-up meeting
What is the model from other WGs? In general, itâs done through âinformalâ decision making
Donât over complicate => Lightweight process
Technical topics
Discussion on Confidence and Qualification Activities
Two work threads: classical black box validation + component confidence
These are explorations, not final decisions
Suggestion: rephrase the âconfidenceâ aspect as âgray/white box qualification approach by focusing on sub componentsâ rather than âimproving confidence,â explaining that tool confidence is a risk analysis that cannot be improved but rather risks can be reduced through qualification
Counter-opinion: different providers can have different Tool Confidence Levels (TCLs) for the same tool due to mitigation actions, which could be seen as improving confidence
Proposal: this WG could provide advice on activities to increase confidence in tool usage, beyond just qualification
Suggested reformulation:
Create confidence in the use of the tools, e.g. by using them carefully and checking the tool results
This will help to reduce the tool confidence level (TCL) as defined in ISO 26262, which measure the tool risks
The remaining risks of the tools have to be reduced by tool qualification
The mostly used qualification method is validation
It can be performed as a black-box approach, just by testing the requirements, e.g. the compliance of a compiler with a C or C++ standard
Another validation strategy which can be applied to open-source tools is a white/grey box approach, i.e. by validating sub-components of the tools, e.g. a lexer, parser or backend of a compiler
Different perspective:
Validation is acknowledged as a common method for qualification for usage of a tool for the higher risk contexts (SIL3/4, ASIL C/DâŚ)
But, evaluation of the development process is another highly recommended method for lower criticality contexts (ASIL B)
Letâs document our arguments about suitability of LLVMâs development process upstream, and perform an assessment with auditors who could volunteer for it
Development process - LLVM developer policy + Evaluation of the process: OSS best practices + human factors metrics
A qualification could then be achieved up to ASIL B context usage
Validation would be a 2nd step approach for qualification for higher criticality levels
Tool Qualification and Library Qualification
Tool qualification by validation: different strategies like using test suites or breaking down qualification by validating sub-components such as the lexer or parser
Qualifying sub-components is useful for overall qualification
Library qualification is more rigorous and typically requires a white box approach with code coverage measurements
Focus on Upstream LLVM and Reusability
The groupâs focus is exclusively on the upstream LLVM project, aiming to create reusable output that downstream companies can utilize
The group strives for reusability, building general or usable tools for various standards
Challenges in Defining a Toy Project
Several constraints, including the lack of resources and the difficulty in choosing a language (C, C++, RustâŚ), compiler version, language standard, and compiler flags for a deliverable âtoy projectâ
Agreeing on a toy project would help answer these questions and provide a concrete direction for the groupâs work
Standard Library Function Qualification
Start with function-level qualification of standard library functions within a limited scope, such as a single header file, as it would be easier to manage than qualifying the entire compiler
Use publicly known sources like CPP reference for requirements and tracing tests, aiming to provide upstream qualification evidence and design that downstream users could then utilize
Library qualification is more rigorous than tool qualification but selecting requirements for library functions is a good idea
If the group explores the library qualification topic, it might focus on pure C language due to the significant differences and complexities of C++
The difference between C and C++ standard libraries is substantial, C++ libraries change frequently
ISO/PAS 8926:2024 to qualify software components (incl. libraries)
Planned to be merged with Part 8-Chapter 12 in next edition of ISO 26262 (2027)\
Two axes: analysis of provenance and analysis of complexity
Depending on results of the analysis, different qualification activities must be performed
Outputs and Collaboration
Draft tutorial for newcomers, focusing on organizing documents, presentations, and projects like compiler and code coverage sanitizers
External toolchains could be referenced
RFC to gather community input, aiming to provide a framework for compiler qualification and enable sharable evidence for downstream users
Other efforts can and will be explored in the future, and all members are encouraged to work on any of these subjects, based on their interests and bandwidth/availability
Actions
@petbernt will make a proposal presentation for library qualification
@slotosch will put his thoughts on tool confidence and qualification activities in a message in the groupâs Discord channel
@uwendi will continue working on analysis of the suitability of the LLVM Developer Policy as development process for safety-critical, and on how to reuse good quality best practices from ELISA as part of this analysis/assessment
@CarlosAndresRamirez will run experiments on LLVM and provide evidence to see what can be accomplished using human-centric metrics analysis in LLVMâs upstream development process
All: provide feedback on @petberntâs RFC (considering the potential library approach) by end of the current calendar week, then @petbernt will publish it
LLVM Qualification Groupâs November Sync-Up Agenda
Pre-reads / Prep
Think ahead: Which linking policy should we adopt for Meeting Materials and why
Bring 1â2 bullets: Your progress, top blocker, and any help needed
Non-technical topics
Docs updates (October): summary of GitHub changes for LLVM Qualification Group Docs
Decision on âMeeting Materialsâ linking policy: Per-meeting slide links (monthly PRs) versus Single folder link (rely on folder contents) in Meeting Materials
WG page check-in: What to add/correct/improve (wording, links, missing topics, clarifications)
US LLVM & relevant bits for this WG: Attendee highlights; what should feed into our backlog?
Technical topics / Round-robin updates
@petbernt: RFC for input request from the community
To our readers,
FYI, weâre merging the two regional sync-ups into one Tuesday 13:00 UTC call; see our Discord channel and the LLVM calendar for details.
The latest Pull Request (PR), prepared by @YoungJunLee was merged and there are no currently pending PRs.
There is a concern regarding meeting materials being stored only in a personal Google Drive. We stopped storing them on GitHub due to feedback from 2 people in the community. Weâll consider contacting the infrastructure team to determine a suitable storage location, possibly archiving quarterly PDFs on GitHub while continuing to share links from the Google Drive for collaboration.
Takeaways from the US LLVM Conference
@evodius96 attended the presentation from Peter Smith from ARM which was interesting and showing a practical example relevant to compiler qualification.
@uwendi attended @slotoschâs round table. Other attendees asked questions about the groupâs work. Feedback from Peter Smith is that he would like to see a keynote from the group next year outlining our vision. Directed the attendees to @petberntâs request for comments.
Does Solid Sands only provide a test suite for toolchain quality, as suggested by a previous conference talk? Looks like they currently offer a broader selection, including libraries qualification. Some concerns exist because it seems that they do not participate in the ISO C and C++ committees, potentially impacting their ability to keep up with language changes and interpretations.
Discussed the challenge of judging the quality of conformity test kits without insight into the build and testing processes. Suggestion: false positives might be an indicator of quality.
Conference presentation and group visibility: there might be motivation for the group to do more for next yearâs conference, possibly a keynote, round table, or talk, to create more awareness. It was too early to present at the recent conference. Aprilâs LLVM Europe conference might be a target, or potentially a conference in Asia if one is organized.
Operational Maturity and Code Reviews: @uwendi attended the operational maturity round table, discussing code reviews as a way to prevent errors. There is an RFC from Infra about enforcing pull requests. Noted that 30% of commits from 5% of contributors are reviewed post-merge, and issues include a lack of reviewers and long post-commit review times.
Technical topics / Round-robin updates
Status of the Qualification RFC and Library Proposal
Concern from @petbernt that non-context-aware readers might misunderstand the RFC.
@petbernt agreed to post it in other Discord channels to increase visibility.
@petbernt also has a small proposal for library qualification that is not yet finished but could be shared later in the week.
Tutorial Preparation
@uwendi provided materials to @YoungJunLee that could be helpful to prepare the tutorial / set of initial documents for newcomers.
Open to further assistance with the interpretation of the functional safety standards
@uwendi can contact old colleagues from the railways industry for materials related to their standards.
@uwendi shared personal efforts regarding the ISO 26262 standard, noting a successful submission and acceptance of around 30 simple comments and plans for two or three significant (possibly controversial) upcoming comments regarding the description of methods 1b and 1c (evaluation of the development process and validation of the software tool).
@uwendi expressed concerns about the wording and requirements for method 1b, suggesting that evaluation of the tool development process should be highly recommended for all ASILs.
Software Tool Experiments and Automation
@CarlosAndresRamirez shared results from his quality-focused experiments on the development process, which were conducted on LLVM and Alive2, confirming at least 12 defects in Alive2 that still need to be reported.
@CarlosAndresRamirez noted that adoption of this strategy will face resistance unless the process is fully automated to prevent extra effort for developers. He will work on automation, possibly involving git hooks, and prepare templates and documentation for the group, emphasizing that this work relates to âevaluation of the development processâ (Method 1b in ISO 26262).
@uwendi requested a demo; agreed to schedule offline.
LLVM Developer Policy Suitability Analysis
@uwendi presented a draft analysis of the LLVM developer policyâs suitability based on the open-source software practices checklist from the Elisa projectâs Lighthouse OSS SIG. She gave an initial, rough evaluation based on gut feeling due to the immaturity of the maturity scale, but stated that the written process generally âlooks goodâ.
Areas rated as having âlimited maturityâ included one item in security/supply chain (SBOMs) and the lack of a formal bus factor metric program.
While the written process looks good, measuring its execution and follow-through requires automation, leading to being stuck due to the undecided maturity scale.
The checklist for LLVMâs suitability would support Method 1B, evaluation of the development process, and argued that a best practices list is more suitable than requiring assessment against a national or international standard, especially for open source.
@uwendi proposed a new action for herself to start writing templates for the workflow steps outlined in the previous sync-up. These templates would be useful for both downstream and upstream. For tool qualification, the idea is to include worksheets for each method.
Actions / Next steps
@petbernt will post some slides with a small proposal for library qualification, and post the RFC in other channels in Discord, to make it more visible.
@uwendi will contact the infrastructure team to discuss where meeting materials could be stored, possibly archiving quarterly PDFs in GitHub while continuing to share links to her Google Drive.
@petbernt will ask Peter Smith from ARM to add a comment to the RFC about input request from the community.
@uwendi will contact an old colleague from the railways industry to ask if they have materials and can help with the new version of the railway standard for tool qualification.
@CarlosAndresRamirez will work on automation using git hooks or similar mechanisms for the development process quality strategy and prepare templates and documentation for the human-centered approach to finding defects to share with the group.
@CarlosAndresRamirez will give a demo of the defect finding experiments at a dedicated slot or the next sync-up.
@uwendi will start writing templates for the gray boxes in the workflow from the last sync-up and share them with the group for review and improvements.
@petbernt will send some presentations about Solid Sands and how their super test and super framework works.
LLVM Qualification Groupâs December Sync-Up Agenda
Hello all,
Our next sync-up meeting will be dedicated to a special topic: Function-Level Qualification Methodology for libc/libc++ (@petbernt âs proposal)
Slides (open for comments):
@petbernt will tell us more about why standard libraries matter for qualification, giving us a walkthrough illustrated with examples from the slide deck:
Overview of the proposed proof-of-concept:
Unique challenges: vast API surface, varied implementations, historical behavior, testability
Why âfunction-level qualificationâ might offer a scalable entry point
Relationship to previous WG discussions on requirements traceability, upstream-friendly artefacts, and modular qualification pilots
Structure of the approach
Requirements decomposition (per function)
Test strategy (functional, boundary, behavioral)
Traceability approach across libc/libc++
Criteria for function selection in the PoC
Letâs take time also for any clarifying questions and discussion about strengths & gaps. Some guiding questions:
Does the function-level approach scale?
Is the requirements/test breakdown consistent with typical qualification workflows?
What do we consider âminimum viable artefactsâ for a PoC?
How should the WG ensure upstream-friendliness?
What is the interaction with existing test suites (libc/test, libc++/test, other conformance suites)?
How might other methods & tools (static analyzers, fuzzing, translation validation) play a complementary role?
@petbernt proposed a function-based qualification approach for C and C++ standard library functions, aiming to create modular, upstream-friendly artifacts (like design and traceability evidence) that downstream users can reuse for functional safety qualification and certification. We discussed the feasibility, complexity, and compliance of applying this component-level approach to extensive libraries like libC and libC++ under standards like ISO 26262.
Highlights
Proposal of a Function-Based Qualification Approach for C and C++ Standard Library Functions: The goal of this concept is to explore a methodology for function-level qualification of the standard C and C++ library that is upstream friendly and modular, motivated by the standard libraryâs inherent modularity, which makes it ideal for incremental qualification. It aims to demonstrate that qualification artifacts can be created in an open-source community and reused by downstream consumers, requiring compatibility with open-source methodologies, with requirements and design artifacts potentially being built from public non-normative references like CPP reference, as direct quotation of ISO specification text is prohibited due to copyright.
Outcome and Objective of the Qualification Methodology Proposal: The outcome of the proposal would be a practical and transparent methodology demonstrating how safety qualification evidence can be created and collaboratively maintained upstream, incrementally increasing the scope of what the open-source community qualifies. This proposal focuses on a practical pilot direction for the Qual WG.
Clarification on the Upstream Methodology and Downstream Qualification: The proposal is not for qualifying upstream, but for creating qualified materials (like design artifacts or traceability artifacts between implementation, design, and tests) that can live upstream and be used downstream. The actual qualification and eventual certification would be the responsibility of downstream users, who would review the evidence and provide it to a third party for assessment.
Compliance with Safety Standards and Modularity of Qualification: The methodology could be applied to all different standards, noting that most standards focus on documentation and artifact tracing. C and C++ based qualification on a function level is a good approach, citing the necessity due to the vast number of functions in the C++ template library. Agreed on the need for a modular approach, viewing the proposal as a proof of concept that can be incrementally expanded.
Scope and Role of the WG in Qualification Evidence Creation: The scope of the proposal is to create upstream evidence, not certification, by providing reusable upstream qualification artifacts, and it is not intended to qualify or certify LLVM itself. The concept envisions the WG producing artifacts in the upstream LLVM repository, which downstream users would then reuse in their safety case or safety plans before providing them to certification bodies. The goal is to reduce duplicated effort by having more evidence provided upstream. The downstream users would then reuse and extend these artifacts within their own safety life cycle, and hopefully begin contributing qualification evidence upstream. The WGâs role is to enable qualification, not to act as a certifying body, with the outcome being a model for creating referenceable qualification evidence for downstream organizations to build upon.
Definition of a Software Component and its Relevance to Safety Standards: A software component is defined as a self-contained part of a software system with a defined interface (e.g., public API function signatures, header files), documented behavior that can be verified with specific expectations, an implementation (source code), and a verification scope with tests, coverage, and analysis tied to it. Safety standards, such as IEC 61508 and ISO 26262, view runtime libraries as software components, requiring each component to have behavioral requirements, a design representation, verification, coverage, evidence of exercise, and traceability between all these elements. Standard library functions naturally fit this definition, with one function or one header ideally classified as one component.
Benefits of Function Level Qualification: Function-level qualification works because each C and C++ function naturally maps to a small, self-contained software component, enabling incremental qualification of one function, one header, or one behavioral subset at a time. This approach encourages parallel contributions, as each contributor can qualify a self-contained component, create a design artifact, and specific test cases. This small scope (for example, four or five requirements per function) could be manageable and easily parallelizable, aiding with traceability and review.
High-Level Approach for Qualification Artifacts: The proposed approach involves defining upstream-friendly requirements that express observable behavior, which would be derived from reverse engineering existing implementations and public non-normative references (like CPP reference, since ISO text cannot be quoted). Requirements should be broken down into atomic, easily testable requirements (similar to autosar style), where one requirement states only one thing. The behavior should also be modeled in a design artifact, such as a PlantUML diagram, to serve as a detailed design artifact representing the functionâs control flow and outcomes. This process would then trace requirements and the design model to the implementation and testing.
Establishing Traceability and Leveraging Existing Test Suites: A lightweight traceability system is needed, which could use YAML or Markdown tables, linking requirements and design models to implementations (source code) and tests. @petbernt suggested trying to leverage and extend the existing LLVM test suites for lib and lib++ to achieve full coverage. Evidence such as coverage metrics (line and branch initially, with MCDC coverage planned for future expansion) should be recorded and published in upstream qualification artifacts.
Recommendation to Start with Simple Pure Functions: It is recommended to start with simple pure functions like memcpy, memmove, memcmp, binary_search, fabs, or isalpha because they are deterministic and have observable outcomes, making them easy to describe and test without platform dependencies, hidden states, or global variables. This serves as an ideal proof of concept before addressing more complex or stateful functions.
Memcpy Example: Key Behaviors and Qualification Steps: The example used is memcpy, with key behaviors being: returning a destination pointer, copying exactly count bytes from source to destination, and having undefined behavior for overlapping regions. The proposed qualification steps for memcpy involve creating small atomic requirements based on public reference models (like CPP reference), modeling it in PlantUML, linking requirements to existing tests, and verifying and potentially extending coverage. The description of memcpy from cppreference.com was cited, highlighting that undefined behavior occurs for overlapping objects or invalid/null pointers.
Refining Specifications into Atomic Requirements and Assumptions: The cited description of memcpy was broken down into six atomic verifiable requirements and two assumptions for the user regarding undefined behavior. A requirement should only say one thing and be testable with one specific test case, which is the definition of atomic. The user assumptions require that source and destination regions do not overlap and that the caller ensures valid and non-null pointers. There is no proposal yet of a specific methodology and recommendations for performing this refinement from natural language to atomic requirements, ensuring that the rewritten specifications are truly a refinement of the original one from natural language.(that the refinement is correct).
Behavioral Model using PlantUML and Traceability: Presented a behavioral model for memcpy in PlantUML, where the requirements are traced to specific sections of the diagram. The simple example shows the function signature, the copy loop, reading and writing to addresses, and returning the destination pointer. Since PlantUML is text-based, it can be handled through regular LLVM pull request review methods.
Proposed Structure for Qualification Evidence in LLVM Repository: The qualification evidence is proposed to live in specific folders: a requirements folder (containing YAML files, one per atomic requirement, defining what must be true and derived from behavioral descriptions), a design folder (containing PlantUML models referencing requirements), and a traceability file per function. The traceability file would list all linked requirements, reference design artifacts, test files, and test cases, verifiable through validation scripts. An example of the traceability file structure was shown, listing header files, design reference, requirements, assumptions, and associated test functions and files, which establishes verifiable links and enables automated validation.
Referencing ISO Standards and Open Source Requirements: About referring to ISO standards, there was a suggestion that while copying the text is restricted, referring to the unique IDs of the standards is possible and provides a more credible and stable reference than a wiki page. Nevertheless, not all upstream maintainers or users have access to the copyrighted ISO standard, which is why an open-source friendly approach using publicly known sources (like CPP reference) is needed. The current lip repository already references specific clauses in ISO standards and POSIX, but acknowledges that non-ISO descriptions might have slight differences.
Additional Requirements from ISO 26262 for Unit Verification: ISO 26262 software component qualification refers to software unit verification (which aligns with C functions), and requires systematic considerations of equivalence classes, boundary values, and extreme values, in addition to simple requirements and tests. These are additional efforts are currently beyond the basic proof of concept.
Automated Generation of Traceability Matrix Visualization: Based on the traceability file, a PlantUML-based traceability matrix visualization can be automatically generated, linking design, requirements, test cases, and test files, which provides a clear visualization of traceability. This visualization would also be committed as text files to the upstream repository.
Placement of Qualification Evidence within LLVM Projects: Qualification evidence is suggested to reside close to its specific project, such as within the libC or libC++ subprojects, with separate subfolders for requirements, design, traceability, and scripts. This structure avoids top-level clutter, mirrors the LLVM modular layout, and clarifies ownership for reviewers. Existing test cases would be reused, and new tests for coverage gaps would reside in the same test directory. A top-level qualification tools folder could hold shared templates and utility scripts for common methodology across subprojects.
Coverage and Confidence Building: For the initial proof of concept, line and branch coverage are suggested to demonstrate the methodologyâs feasibility, with the future potential to expand to MCDC coverage once the test infrastructure is stable. The goal is to identify coverage gaps and create more tests until each qualified function becomes a reusable evidence unit, fully testable with 100% coverage and traceability.
Reusability of Existing Tests and Tooling for Traceability: About usefulness of existing tests in the framework, at least one person in the WG has analyzed the test cases, confirming they are often not mapped to specific functions or requirements, but adding this information (e.g., in comments) would make them usable. The same suggested that while PlantUML might be useful for flow specification, it might be âover engineeringâ for simple functions, and for traceability, understanding the file location is sufficient and easily translated into a qualification report.
Feasibility Concerns Regarding Effort and Library Complexity: Some expressed concern that implementing this approach looks like an âenormous effortâ and questioned its applicability to large and complex libraries like libC and libC++, especially due to different implementations for various architectures. Companies are already performing this work proprietarily, and sharing basic modeling of simple functions would provide the majority of the work upstream, simplifying the task for downstream users who would only need to add architecture-specific artifacts.
Handling Compliance and Deviations in Complex Functions: Another concern is that complex or stateful functions in the libraries might not inherently comply with ISO 26262. Even if there are deviations from the standard, documenting the actual behavior and providing known workarounds (such as assumptions for undefined behavior in a safety manual) still results in a valid safety case for an assessor, as the documented behavior is transparent. One confirmed finding bugs in the C++ specification itself and agreed that a safety manual is the correct approach to document known issues.
Proprietary Qualification Efforts and Upstream Contribution: Existing companies offer products with test cases, requirements, coverage, and analysis for over a thousand C++ template specifications. They would not upstream this huge value as it is their business model. The purpose of the Qualification WG should still be to attempt some form of qualification upstream, not only for the benefit of reducing costs if some qualification work were already available upstream, but also for the reunification of a fragmented landscape of knowledge and understandability of basic properties of these libraries that are common to all downstreams.
Focus on Latest Standard Version and Next Steps: The methodology aligns with the current LLVM libC efforts toward C23. Clarity on focusing on the latest standard version is important. Regarding a plan for developing the testing framework/script, LLVM already has the lit unit test framework, and the work would involve mapping existing tests to requirements and extending coverage.
Actions / Next steps
@petbernt will check with people working on libraries development and testing in the community (e.g. Clang C/C++ WG) for opinions, and for historical background on any similar approaches that could have been already tried in the past for upstream libraries.
@uwendi will reflect upon and write down ideas on how to explain the spec refinement process in the methodology.
LLVM Qualification Groupâs January 2026 Sync-Up Agenda
Hereâs the agenda for next sync-up meeting:
2026 Objectives & Radar: Key focus areas for next year; topics on the horizon; how to engage the broader LLVM community with regards to our action ideas.
Action Items Review: Quick status check on open actions; identifying ways to unblock or move stalled items forward.
Small, Practical Deliverables: Ideas for lightweight, useful outputs (e.g. short notes, alignment with other LLVM groups, âgood-enough-for-nowâ artifacts that can evolve).
Seeking feedback from a relevant group in the community might be beneficial (e.g. subgroup in the Clang C/C++ WG?), especially if they have prior experience with improving testing.
Move forward with creating a small example as part of a PR to communicate the objective of the process more clearly. Start the work on a PoC using the existing example, potentially refining it.
The discussion around the PoC included refining an example function and planning follow-up functions. We will include proposed folder modifications in the PoC to elicit feedback from maintainers through the PR process, especially regarding naming conventions.
Other suggestions:
Create a commit on a cloned repository for internal review before making a PR.
Draft the process as a workflow, with steps and I/Os.
Note on Standards and Templates:
Keep the process documentation free of excessive standards jargon to avoid deterring potential contributors.
Regarding ongoing work on templates, only one template has been started but five others remain to be written, with a goal of keeping them simple by adding guidelines and examples directly within the Markdown comments.
An initial template confusion was resolved by changing the wording to reflect that it was meant to determine the need for evaluation and qualification, not the evaluation report itself.
LLVM Development Process and External Checklists:
Presented findings on the LLVM development process concerning an ELISA checklist, noting areas like governance and community were strong, but quality assurance and engineering discipline were only âso soâ due to inconsistent pre-merge peer review.
Next step is to compare the evaluation using the ELISA checklist with Tom Stellardâs detailed assessment and potentially contact Tom for clarification, as evaluating the development process is a key complementary method to testing and validation for qualification.
External Interest and Templates: two companies (one of them is Quadric) have expressed interest in using the templates. Real project usage of our outputs could help improve our proposals.
Reporting Defects in Alive2: long-pending action of reporting defects found in Alive2 - we have the list of findings and they need to be formally reported, with the goal of completing this task this month before the next meeting.
Engagement with LLVM Security Group:
Last two MoMs from the LLVM Security Group mention functional safety and a âtestability problemâ.
Take the initiative to contact them to explore potential collaboration, particularly regarding test traceability.
Need for High-Level Guidance for Onboarding:
Need for higher-level guidance, such as a visual flow or simple framework, to help potential users self-assess their needs and navigate the templates.
This guidance, including a possible questionnaire, could be integrated into the existing tutorial on mapping projects and ISO 26262 clauses.
Start a small proof of concept for one function using the same example from the slides, including proposed folder modifications, and refine it internally before creating a pull request.
Contact Peter Smith to discuss the security groupâs minutes mentioning functional safety and test traceability, and inform our group on Discord about how to proceed.