This thread is dedicated to sharing the meeting minutes of the LLVM Qualification Working Group. We will use this space to publish summaries and action items from our monthly sync-ups, in order to keep the broader community informed.
Meeting notes are initially drafted collaboratively in a shared FramaPad and then archived here after each session for long-term reference and discussion.
The LLVM Qualification WG was formed following community interest in exploring how LLVM components could be qualified for use in safety-critical domains (e.g. automotive, oil & gas, medical). We welcome contributions from all perspectives: compiler developers, toolchain integrators, users from regulated industries, and others interested in software tool confidence, safety assurance, and systematic quality evidence.
If youâre interested in participating or following along, feel free to join the discussions here or connect via the LLVM Community Discord in the #fusa-qual-wg channel.
Warm regards,
Wendi
(on behalf of the LLVM Qualification WG)
Approaches to Linking Tests and Requirements: Jessica discussed different methods for associating tests with requirements, such as adding text to existing tests or creating a directory to reference them. She suggested that adding text to the tests might be the most practical initial step. Wendi noted this down as a potential solution.
Leveraging Existing Specifications and Tests: Wendi inquired about existing specifications from the C/C++ working group. Jessica mentioned that implicit requirements might already exist in the test directory where clangâs behavior is checked for specific code. Jorge suggested utilizing golden samples as tests and mapping them to requirements using LLVMâs existing testing infrastructure.
Command Line Option Testing: Mikhail proposed checking which command line options are used by which tests during test suite runs to ensure all specified options are tested. Wendi asked about typical requirements management practices, noting the need for unique IDs and clear, verifiable descriptions.
Requirements Management Tools: Wendi shared links to free and open-source requirements management tools, mentioning Basil. Oliver described a similar tool used for tracing links between requirements, tests, and other elements, capable of generating coverage reports. Jorge offered to investigate in the Eclipse SDV/S-Core group what they use for requirements.
Automating Traceability: The discussion addressed automating the mapping between specifications and tests, either within the tests or using a requirements management tool. Oliver described how traceability tools use commands with IDs to link requirements to various artifacts and check for coverage.
Scope and Maintenance of Specifications: Wendi raised the question of what should be specified, by whom, and how it can be maintained, initially suggesting a focus on clang and C++. Oliver cautioned that this task should not be underestimated due to the potential workload. Jessica suggested that while specifying every compiler transformation might be difficult, existing tests could catalog behavior, and tools like Alive2 could verify semantic equivalence for certain optimizations. Petar emphasized the potential enormity of the effort required for detailed specification and maintenance.
Black Box Testing and Trust: Petar shared experience from qualifying GCC by treating it as a black box and using clang for result comparison. They suggested that a similar approach of extensive testing might be necessary for LLVM, focusing on building trust rather than deep internal analysis. Oliver seconded this, noting the difficulty of qualifying the Linux kernel through code analysis and suggesting the possibility of safety monitors or limiting the scope of usable compiler options. Jorge mentioned that qualified commercial compilers often come with safety manuals and usage guidelines.
Qualification of Standard/Runtime Libraries and Linker: Petar inquired about the qualification of the standard library. Jessica suggested working on libraries after addressing clang. Mikhail expressed interest in qualifying open-source runtimes. Wendi noted this point for future discussion.
Next Steps and Continued Discussion: Wendi suggested continuing the discussion on the Discord channel or Discourse and encouraged participants to add their thoughts to the notes.
US/Canada
Proposed Work Breakdown for Qualification: Wendi presented a suggestion coming from JR Simoes to split the qualification work into three parts: front-end, middle-end, and back-end, with an initial focus on the C/C++ front-end due to its broad use in safety-critical applications. Noted that other languages like Rust and relevant tools could be included later. Key discussion points for confidence in use of the compiler include specifications, testing, formal verification, sanitizers, runtime diagnostics, quality of compiler inputs, known issues analysis, and documentation (user manuals, safety manuals, release notes). The qualification of standard and runtime libraries was also added as a topic based on a suggestion during the EU/Asia session.
Challenges in Defining and Tracing Specifications: Wendi highlighted critical questions regarding specifications: whether existing specifications for front-ends like Clang can be reused, how to define partial specifications if none exist, and how to trace specifications with existing open-source verification means, such as the 25,000 tests, to achieve bidirectional traceability. Posed questions about the ownership and maintainability of these specifications. Opinions from the EU/Asia session suggested annotating tests with unique IDs linked to requirements or grouping tests into directories associated with specific requirements.
Recommendations for Specification and Test Organization: Florian shared his experience with Rust, suggesting that specifications should reuse as much as possible from existing project documentation and be built in a format conducive to linking, such as HTML. Oscar pointed out that C/C++ benefits from existing ISO standardization documents, so the focus should be on LLVM-specific features rather than creating new language specifications. Both Florian and Oscar agreed that structuring tests in directories that mirror the C standardâs chapters and sub-chapters is a practical and accepted approach for C/C++ compiler qualification, making maintenance easier for both safety-critical and non-safety-critical maintainers.
Completeness Argument and Requirements Management Tools: Oscar emphasized the need for a âcompleteness argumentâ in qualifying open-source software, explaining that beyond code coverage, it is essential to demonstrate why test cases are sufficient, often by using equivalence classes and programming constructs to define comprehensive test strategies. Wendi inquired about the use of free or open-source requirements management tools. Oscar indicated that he uses a proprietary model for linking functional specifications to test cases.
Experiences with Requirements Management Tools: Florian shared his experience, stating that while tools like Sphinx needs are useful for general software and libraries, programming language specifications are too dense for typical requirements writing tools. They opted for a custom Sphinx extension for language specification to test tracing, finding it more suitable than trying to adapt existing tools not designed for this specific task. Pete supported this, noting that Sphinx and Sphinx needs were adopted for the Rust coding guidelines within the safety-critical Rust consortium, finding them useful for building verifications and ensuring traceability.
Feature-Based Qualification Structure: Oscar suggested structuring the qualification based on logical features (e.g., language compliance, optimization rules, target-specific features) rather than technical components like front-end, middle-end, and back-end, as this logical structure is more relevant for certifiers who may not understand internal compiler architecture, and that tool qualification is typically a black-box activity. John clarified that the split front-end/middle-end/back-end is driven by the capabilities of existing formal verification tools like Alive2 and their translation validation work, which currently can validate optimizations on the middle-end and back-end but not the front-end. Oscar and Wendi agreed that tools used for tool qualification do not need to be qualified themselves, simplifying the process. Florian expressed interest in separately qualifying linkers, but Oscar argued that qualifying a tool always involves documenting its environment and configuration, making it an integrated process.
Actions
Wendi:
Create calendar for the group, with regular invitations to the sync-ups
@regehr introduced Alive2, a software tool for refinement checking of LLVM optimizations, and the arm-tv tool, developed by his group, for translation validation of ARM 64-bit assembly code, explaining their methodologies and demonstrating their application in bug detection. While arm-tv has found 46 bugs, primarily silent miscompiles, scalability challenges, particularly with memory access, were acknowledged. Some questions were raised about limitations during lifting, toolâs trustworthiness, adding new architectures.
Details
Introduction to Alive2@regnerjr introduced Alive2, a software tool for refinement checking of LLVM optimizations. He explained that LLVMâs middle-end rewrites Intermediate Representation (IR) to improve code, often making it faster or smaller. These transformations are considered ârefinements,â meaning the new codeâs set of meanings is a subset of the old codeâs. Alive2 uses symbolic execution of code before and after optimization and generates questions for the Z3 theorem prover to verify if the optimized code refines the unoptimized code.
Alive2 Compiler Explorer@regehr encouraged attendees to try Alive2 via its compiler explorer instance at alive2.llvm.org, noting its ease of use and providing an example problem to explore. He also mentioned that papers have been written about Alive2, but hands-on use is likely more engaging.
arm-tv** overview** @regehr presented the arm-tv tool, developed by his group, which performs translation validation for ARM 64-bit assembly code. He demonstrated an LLVM function that uses `memcmp` and showed how the ARM backend optimizes it, including inline substitution of `memcmp` and replacing control flow with a conditional select. The arm-tv tool aims to prove that the assembly code is a faithful translation of the LLVM IR.
Translation Validation methodology@regehr explained that translation validation involves assigning a mathematical meaning to the code before and after transformation. Alive2 is used to formally represent the meaning of LLVM functions. For ARM code, arm-tv assigns meaning either by using hand-written instruction semantics derived from the manual or through a mechanically derived version from ARMâs formal description of instructions. The tool then translates the ARM code back into LLVM IR and invokes Alive2 for a refinement check.
arm-tv** in action** @regehr demonstrated arm-tv, which is called backend-tv and also supports RISC-V. The tool parses assembly into LLVM MC inst, lifts the ARM assembly code by building a small execution environment that resembles an ARM processor with registers initialized with âfreeze poisonâ (an indeterminate bit pattern), and then processes the lifted instructions. This process results in a clumsy but optimizable function that Alive2 can then efficiently check against the original code.
Bug detection with arm-tv@regehr shared that arm-tv has found 46 bugs, primarily silent miscompiles, most of which are in the machine-independent parts of the LLVM backend. He noted that while arm-tv recently started supporting RISC-V, fewer bugs have been found compared to ARM, attributing this to the multi-backend impact of the existing bugs. @regehr mentioned that most bugs were found with the help of fuzzers and an automated testing workflow.
Origin and scalability challenges@regehr revealed that the impetus for arm-tv came from a conversation with JF Bastien years ago about trusting LLVMâs top-of-tree for automotive applications. @YoungJunLee inquired about handling large functions more efficiently, to which @regehr acknowledged scalability as a significant weakness of the tool, particularly with memory access, indicating that improvements to Alive2âs memory encoding are needed.
Limitations and trustworthiness@uwendi asked about limitations or loss of information during lifting. @regehr explained that while ARM assembly semantics are cleaner, challenges arose in lifting code with powerful pointers to LLVMâs weaker object-offset model, necessitating changes to Alive2âs memory model to support âphysical pointersâ. He addressed concerns about trusting arm-tv, suggesting documenting the toolâs scope and limitations, with a separate group of people needed to verify its implementation for certification purposes.
Tool Usage and Bug Reporting@regehr stated that currently, only his team uses arm-tv. When a bug is reported by the tool, he verifies it on an actual ARM machine to confirm the misbehavior before reporting it to the LLVM developers, ensuring the toolâs output is vetted. He also mentioned the existence of false alarms due to the complexity of the LLVM memory model.
Impact on LLVM specification and Future Work@regehr shared an anecdote where arm-tv uncovered an ambiguity in the interaction between the LLVM Lang Ref and the AR64 ABI document, which led to a resolution and fix in LLVM. Regarding future work, he expressed interest in supporting translation validation of inline assembly and concurrency-related aspects of LLVM IR, such as volatile accesses and interrupt handlers in embedded systems.
Adding new architectures Luc Forget inquired about the modularity of arm-tv for adding new ISA semantics. @regehr explained that while not âsuper modular,â refactoring had made it easier to add RISC-V support, and adding a third architecture would likely not be difficult, though Alive2âs lack of multiple address space support remains a limitation for GPU backends. He also highlighted that supporting a new architecture primarily requires a description of its instruction set. @regehr mentioned that for ARM, they can automatically generate the instruction semantics from ARMâs Architecture Specification Language (ASL), but for RISC-V, it was done by hand. He hopes to derive x86-64 semantics automatically in the future, as manual implementation is too extensive.
Discussion: Proposed changes to membership criteria to address the current internal processâ inherent challenges with active collaboration and contribution.
Current Status: Overview of Clangâs test suite (clang/test/cxx) and conformance challenges.
Discussion: How these insights impact LLVM Qualification Groupâs goals, explore possible steps on creating better traceability and conformance for Clang.
Open Floor
Any additional topics, questions, comments, or suggestions from group members.
Meeting Kick-off and Participant Introductions: as a previous step to discussing membership criteria, introduction of each attendee, sharing background and interest in the LLVM qualification group.
AI in Software Development and Qualification: Carlos and Oscar discussed the role of AI in software development, particularly in the context of ISO 26262 compliance. Oscar mentioned a study where AI tools were classified as TCL1 due to uncertainties in their qualification, unlike other tools often classified as TCL3, emphasizing the human ability to detect errors. Carlos expressed skepticism about AI-generated code making it into production for critical software within the next decade due to liability issues and AIâs current limitations in understanding broad code context, which was supported by an experiment showing AIâs failure to recognize dependencies.
Accelerators and Safety Critical Spaces: Erik focuses on high-performance computing and runtimes, and bringing this technology to safety-critical spaces like automotive. They clarified that their work involves certified runtime components that depend on LLVM for qualification, positioning themself more as a runtime specialist rather than a compiler expert in this context.
Open Source Static Analysis Tools and Legal Challenges: Petar shared their experience in trying to open-source a static analysis tool for automotive standards like MISRA and AUTOSAR. They explained that legal issues, particularly concerning the exact wording of error reports and the reuse of standard parts, prevented the toolâs public release, despite having presented it five years prior. Davide affirmed a similar experience, noting that MISRA and AUTOSAR checks cannot currently be open-sourced, highlighting the legal complexities involved.
Discrepancies in Open Source Standards Access: Oscar expressed surprise regarding the difficulties with open-sourcing AUTOSAR-related implementations, as AUTOSAR specifications are freely available, unlike MISRA documents which require payment. Petar clarified that while AUTOSAR standards are free to download, reusing parts of them requires written permission from the consortium, which has been difficult to obtain. This discussion underscored the legal and logistical hurdles in leveraging open-source initiatives for automotive industry standards.
LLVM Component Qualification by Validas: Oscar detailed Validasâs experience in qualifying LLVM components, including LLVM-based compilers and clang-tidy. They highlighted the usefulness of clangâs feature to log optimization rules for qualification purposes and also mentioned their qualification kit for clang-tidy, which requires qualifying each rule individually. Additionally, Oscar noted their ongoing process of qualifying the STL template library, having identified and contributed fixes for issues in its implementation.
Compiler Optimizations and Safety Concerns: Petar raised a question about âwrongâ optimizations in compilers, stating that as a compiler developer, they see nothing inherently wrong with optimizations and that issues are typically bugs, not inherent flaws in optimization. Oscar provided examples of optimizations that can lead to incorrect or unexpected behavior, such as integer overflow issues or deviations in floating-point calculations due to differences in host versus target accuracy. The discussion emphasized the need for careful configuration and understanding of compiler behavior in safety-critical contexts to ensure deterministic output.
Managing Known Bugs in Open Source Tools: Oscar discussed the importance of managing known bugs in open-source tools for qualification purposes, noting that the existence of bugs is acceptable as long as workarounds are available. They suggested that improving the classification and mapping of known bugs to specific features would significantly aid in filtering and scanning for relevant issues, making the analysis process faster and easier for developers.
Internal Process Changes and Membership Criteria: Wendi briefly introduced a proposal for changes in the groupâs internal process, including membership criteria and participation expectations. They shared a link to the detailed description, emphasizing the need for clear expectations regarding contributions and acknowledging the limited time and bandwidth of participants.
Valuing Small Contributions: Wendi emphasized the significance of small contributions, stating that even a few minutes or one hour per month dedicated to the group would be meaningful and important. They encouraged attendees to review and comment on the shared document, noting that it was a lightweight version of the security response definition of group composition.
RFC Summary and Offline Review: Wendi shared a link to a summary of the main points from an RFC written in April 2023, which is related to Clang conformance. They requested that participants review it offline and share their opinions on Discord rather than the Discourse forum.
US/Canada
Proposed Internal Process for the Group: Wendi presented a proposal for a new lightweight internal process to address concerns about group efficiency and the need for a more structured approach. They highlighted the importance of recognizing and respecting membersâ limited bandwidth and valuing small contributions, as some members might have mistakenly believed that only full-time commitment was expected.
C++ Conformance Testing Challenges: Wendi shared insights from their contact with the Clang C/C++ working group regarding Clang conformance specifications and testability. An RFC from April 2023 indicated that developing a C++ conformance test suite faced resource limitations, preventing any current action despite a good description of how it could be done. A significant hurdle was the licensing issues with test vendors, as they only allowed reporting pass/fail results but not opening tests to analyze failures, making error analysis impossible for open-source use.
Clang CXX Directory and Defect Reports: Vlad elaborated on the `clang/test/CXX` directory, noting its two main parts: `DRs` (defect reports) and everything else. They maintained the `DRs` section, which contained about 700 tests for defect reports, far exceeding other implementations. Vlad mentioned that much of the work in this directory, particularly the first 600 defect reports, was done around 2014 by Richard Smith, but progress stopped after that.
Challenges with External Conformance Test Suites: Vlad explained that efforts to use external test suites like those from Perennial and Plum Hall in Clang were unsuccessful due to restrictive licensing, which would essentially require these companies to forfeit their business. They also mentioned that some of these test suites were not ideal and could even contain bugs. Wendi confirmed similar issues with SolidSands, stating that it was difficult to use such suites in an open-source context.
C++ Standard and Compiler Conformance: Vlad discussed the historical decision not to include many C++ examples in the standard, which created long-term issues for language evolution and caused increasing disagreement among implementations, especially for newer features. They emphasized the RFCâs primary goal: to find a way to write and maintain a test suite that avoids decay, proposing tracking the git repository of the draft to reflect updates to the standard in updated tests. Vlad explained that compilers often do not conform to published standards due to subsequent defect reports and accepted papers, citing the ârelaxed template template parameterâ debacle as an example.
Private Compiler Qualification and Test Suite Quality: Wendi inquired how companies privately qualify their LLVM-based compilers given the lack of proof of conformance to standards. Vlad expressed skepticism about the quality of such private test suites, stating that Clang itself does not claim full conformance due to known unresolved issues, such as the incomplete implementation of the 2019 name lookup paper. Vlad also detailed challenges with âcomplete class contextâ rules, where compilers are expected to handle dependencies correctly but often do not due to performance concerns, making it difficult for external parties to fix.
Current Status of RFC: Wendi summarized the RFCâs status, confirming with Vlad that nothing had changed regarding the feasibility of in-house effort or external conformance test suites due to resource and licensing issues. Vlad stressed the need for ongoing communication with the core working group to correctly interpret the standard and identify parts considered âgarbageâ that require rewriting.
Safety Critical Rust Consortium: Pete discussed their work with the Safety Critical Rust Consortium, which aims to identify and address gaps in the Rust ecosystem and language for safety-critical applications. They explained that the consortium seeks to enable Rustâs use in more safety-critical industries and at higher safety criticality levels. Vlad raised concerns about the completeness of Rustâs specifications, noting that the specification for name lookup was unclear. Pete acknowledged that Rust, as a less mature language, had gaps in its documentation, but efforts were underway to improve the reference and FLS documents.
Actions
All participants: review description of the proposal of internal process update, share thoughts on the Discord channel, and add comments or modify directly in the FramaPad for improvement.
All participants: review the Clang conformance summary and send feedback and questions on the Discord channel. The Clang C/C++ WG members are open to answer to our questions and concerns.
Wendi: try to find a conversation with Robert C. Seacord about Plum Hallâs test suite and update Vlad.
Ahead of the call, Iâd like to invite you to drop a quick message on Discord about the offline reviews we talked about last time (see also the minutes):
Insights from a conversation with an ELISA project member on resources & funding
Given that time is short, I may also create separate Discord threads to keep these discussions moving more efficiently.
Thanks again to everyone who answered the form. @etomzak Youâre warmly welcome in our calls, even if your availability is limited.
Small note: at the moment, @evodius96 is officially the only member from US/Canada time zones. @PLeVasseur is interested and expected to join sync-ups, so just letting you know for context.
Hi Wendi, due to travel I wonât be able to attend this upcoming call. If there is a better time for everyone, please donât hesitate to make a change. Thank you!
Hi @evodius96, thanks for letting me know! Since you wonât be able to attend, Iâll cancel the upcoming call. Weâll keep the EU/Asia sync-up as the main source of updates this time, so Iâd kindly ask you to have a look at the minutes afterward to stay in the loop. Looking forward to catching up with you in a future call once youâre back from your travels. Safe travels!
For our upcoming sync-up (tomorrow), we have more items on the agenda than we can realistically cover in one hour. Hereâs a draft of the presentation (the final version will be uploaded to GitHub after the sync-up):
To make sure we use our meeting time efficiently, and to give everyone a fair chance to contribute, Iâd like to suggest that we handle some of the non-technical topics asynchronously on our Discord channel.
Communications & outreach (conferences and meetups)
Resources & funding (initial conversation inspired by ELISA project)
By shifting these items to Discord, weâll free up the sync-up call to focus on technical discussions (e.g. directions for a grey-box approach, tool usage confidence, evaluation of development processes).
Outcomes from Discord discussions will also be summarized here in our meeting minutes on Discourse so nothing is lost. Looking forward to your thoughts and contributions on Discord!
One shared concern about AI writing down every word (from a non-member)
Gemini not enabled today
New self-nomination through the Google Form - Zakyâs presentation
EE student from Indonesia
Coming as an individual
Working with ISO/SAE 21434 (cybersecurity)
Oscarâs idea for the US LLVM 2025 conference (end of October)
Proposal to have a corner about compiler qualification at the exhibition for sponsors
Discuss with people and attract interest on it
No conclusion about this point, to be taken for discussion to Discord
Technical topics
Wrap-up about direction and focus of the discussions since July
Reference functional safety standard
Members from several industries (automotive, trains, robots, etc), so different functional safety standards apply
General framework for functional safety of E/EE/PE systems is IEC 61508, so makes sense to use it as first guidance
As IEC 61508 is parent of other functional safety standards, the expectations around tool confidence are very similar
Need to provide evidence of tool usage
Three questions from the safety standards (see slides)
If answers Yes - Yes - No, then there is a need to provide the evidence
Comments about question 3:
Most safety standards are written for users, so it depends on how much they examine the ârelevant outputsâ
In the case of a compiler, relevant output â final executable
More and more difficult to thoroughly verify the final executable (complexity)
Many of the tools that are traditionally used by vendors are closed, some open tools that can be used to check the relevant outputs
First target: Clang compiler
As a tool provider of Clang, we donât know what will be the usage
As a tool user, you can restrict yourself (for example, using it only for debugging, not for mass-production)
Users will need to rely on the compiler depending on the usage
All the C++ parsing and semantic analysis is done by the Clang frontend
Language + Standard => Version changes are fast
Which flavor of C or C++?
C++ spec improves significantly
C spec is more rigorous
Suggestion of small scope:
Limit to the lexer? Spec wise, it is more simple
Opinion 1: use of restricting to lexer is limited; from safety point of view, trust the lexer but what about the rest; requirements and association to what use cases
Opinion 2: agree, need of a valid use case for the lexer
About effort for a conformance test suite:
Opinion 1: Amount of effort would be huge even for 1 version
Opinion 2: Testing is laborious but not very hard
Opinion 3: If you want to do a good conformance test, bottleneck is interpretation of the standard; testing specification against C/C++ is not as with Rust
Comment: commercial test suites are expensive, 40-45K Euro to qualify only one version of a compiler
What is generated is version dependent
About usage of Alive2:
Replace the Clang front-end with Alive2 front-end and generate Alive2 IR from source code?
Clarification: alivecc doesnât replace Clang itself; it simply adds a pass plugin for verification at the IR transformation stage
Grey-box approach
Qualification is typically black-box activity
Disadvantage: to be done for every combination, optimization options, etc
Grey-box approach could be useful, but one limitation is lack of specification of intermediate I/O
Example: specification of the IR
Identification of regressions in IR could be useful
Just a quick update: Iâve submitted a PR to update the documentation and add links to the August and September 2025 sync-up slide decks, which helped guide our recent discussions:
The slides are currently hosted in llvm-project/docs/qual-wg/slides, but following feedback, I plan to migrate them to a more appropriate location (likely llvm-www) once confirmed with the community. Please feel free to check the PR for details, and let me know if you have any feedback!